DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 16 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 16 recites the limitation "the second ML mode" in line 8, and “the one or more receives” in line 10. There is insufficient antecedent basis for these limitations in the claim. For the purposes of compact prosecution, “the second ML mode” will be read as “the second ML model”, and “the one or more receives” will be read as “the one or more receivers”.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-6, 11-16, and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lovchinsky et al., U.S. Publication No. 2023/0254650, published on August 10, 2023 (Lovchinsky).
As to Claim 1, Lovchinsky discloses a method, comprising: receiving, by a processing system, reference audio data [804] representing one or more voices (training data [804] comprising audio clips of a speaker or speakers; para. 0126, lines 1-5); generating, by the processing system and using a first machine learning (ML) model [808], an embedding of the reference audio data (a voice signature [622, 824], is created; para. 0128, lines 1-8); receiving, by the processing system, live audio data [620] representing sound detected by one or more microphones [402] of one or more hearing instruments [400] (an audio signal is received from a hearing device microphone; para. 0100, lines 12-14); generating, by the processing system, an input spectrogram of the live audio data [620] (the live audio data is converted to the frequency domain; para. 0103, lines 9-14); using, by the processing system, a second ML model [602] to generate a masked spectrogram (a frequency domain output [610] is formed; para. 0108 lines 8-11) based on the embedding and the input spectrogram, wherein the masked spectrogram represents a version of the live audio data in which portions of the live audio data spoken in the voices represented by the reference audio data are enhanced (a mask created by the reference audio data is applied to the input audio signal to yield the output [610] which is isolated speech; para. 0107); and causing, by the processing system, one or more receivers of the one or more hearing instruments to output sound based on the masked spectrogram (the output [610] is recombined to output on a receiver of an ear-worn device; para. 0101).
As to Claim 2, Lovchinsky remains as applied above to Claim 1. Lovchinsky further discloses that the first ML model [808] is a first neural network (para. 0155) and the second ML model [602] is a second neural network (para. 0105).
As to Claim 3, Lovchinsky remains as applied above to Claim 1. Lovchinsky further discloses providing, by the processing system, a first request to a computing device for a first individual to provide first input consistent with speaking a predetermined word or phrase (a prompt is displayed for the user to read back, the prompt is then recorded as input; para. 0151, lines 21-25; see Fig. 12); obtaining, by the processing system, the reference audio data as the first input; and associating, by the processing system, the reference audio data with the first individual (the voice signature is stored and associated with a particular speaker; para. 0152).
As to Claim 4, Lovchinsky remains as applied above to Claim 3. Lovchinsky further discloses that the reference audio data is first reference audio data (the voice signature corresponds to reference audio data; para. 0152), and the method comprises providing, by the processing system, a second request (a respective audio segment is obtained for each of a plurality of speakers; para. 0151, lines 10-16) to the computing device for a second individual to provide second input consistent with speaking the predetermined word or phrase (a prompt is displayed for the user to read back, the prompt is then recorded as input; para. 0151, lines 21-25; see Fig. 12); obtaining, by the processing system, second reference audio data as the second input; and associating, by the processing system, the second reference audio data with the second individual (the voice signature is stored and associated with a particular speaker; para. 0152).
As to Claim 5, Lovchinsky remains as applied above to Claim 1. Lovchinsky further discloses receiving, by the processing system, user input of a request to set the hearing instrument into a group mode, wherein, when the hearing instrument is in the group mode, the processing system uses the second ML model to generate a masked spectrogram where one or more voices selected by a user are enhanced (by selecting more than one speaker, the speech of multiple target speakers is isolated by the voice isolation network; para. 0105, lines 1-12).
As to Claim 6, Lovchinsky remains as applied above to Claim 1. Lovchinsky further discloses that the masked spectrogram is a first masked spectrogram (a frequency domain output [610] is formed; para. 0108, lines 8-11), and the method further comprises: receiving, by the processing system, a selection of one or more individuals (a target speaker(s) change is selected; para. 0123, lines 1-9); selecting, by the processing system, embeddings associated with the selected one or more individuals (an updated voice signature is sent to the voice isolation model; para. 0123, lines 9-20); receiving, by the processing system, second live audio data [620] representing additional sound detected by the one or more microphones [402] of the one or more hearing instruments [400] (an audio signal is received from a hearing device microphone; para. 0100, lines 12-14); generating, by the processing system, a second input spectrogram of the second live audio data [620] (the live audio data is converted to the frequency domain; para. 0103, lines 9-14); generating, using the second ML model [602], a second masked spectrogram (a frequency domain output [610] is formed; para. 0108 lines 8-11) based on the selected embeddings and the second input spectrogram (a mask created by the reference audio data is applied to the input audio signal to yield the output [610] which is isolated speech; para. 0107); and causing, by the processing system, the one or more receivers to output sound based on the second masked spectrogram (the output [610] is recombined to output on a receiver of an ear-worn device; para. 0101).
As to Claim 11, Lovchinsky discloses a hearing instrument comprising: one or more microphones [402]; and one or more programmable processors [1402, 600, 800] (para. 0161), configured to: receive reference audio data [804] representing one or more voices (training data [804] comprising audio clips of a speaker or speakers; para. 0126, lines 1-5); generate, using a first machine learning (ML) model [808], an embedding of the reference audio data (a voice signature [622, 824], is created; para. 0128, lines 1-8); receive live audio data [620] representing sound detected by the one or more microphones [402] (an audio signal is received from a hearing device microphone; para. 0100, lines 12-14); generate an input spectrogram of the live audio data [620] (the live audio data is converted to the frequency domain; para. 0103, lines 9-14); use a second ML model [602] to generate a masked spectrogram (a frequency domain output [610] is formed; para. 0108 lines 8-11) based on the embedding and the input spectrogram, wherein the masked spectrogram represents a version of the live audio data in which portions of the live audio data spoken in the voices represented by the reference audio data are enhanced (a mask created by the reference audio data is applied to the input audio signal to yield the output [610] which is isolated speech; para. 0107); and cause one or more receivers of the one or more hearing instruments to output sound based on the masked spectrogram (the output [610] is recombined to output on a receiver of an ear-worn device; para. 0101).
As to Claim 12, Lovchinsky remains as applied above to Claim 11. Lovchinsky further discloses that the first ML model [808] is a first neural network (para. 0155) and the second ML model [602] is a second neural network (para. 0105).
As to Claim 13, Lovchinsky remains as applied above to Claim 11. Lovchinsky further discloses that the one or more programmable processors are configured to: provide a first request to a computing device for a first individual to provide first input consistent with speaking a predetermined word or phrase (a prompt is displayed for the user to read back, the prompt is then recorded as input; para. 0151, lines 21-25; see Fig. 12); obtain the reference audio data as the first input; and associate the reference audio data with the first individual (the voice signature is stored and associated with a particular speaker; para. 0152).
As to Claim 14, Lovchinsky remains as applied above to Claim 11. Lovchinsky further discloses that the reference audio data is first reference audio data (the voice signature corresponds to reference audio data; para. 0152), and the one or more programmable processors are configured to: provide a second request (a respective audio segment is obtained for each of a plurality of speakers; para. 0151, lines 10-16) to the computing device for a second individual to provide second input consistent with speaking the predetermined word or phrase (a prompt is displayed for the user to read back, the prompt is then recorded as input; para. 0151, lines 21-25; see Fig. 12); obtain second reference audio data as the second input; and associate the second reference audio data with the second individual (the voice signature is stored and associated with a particular speaker; para. 0152).
As to Claim 15, Lovchinsky remains as applied above to Claim 11. Lovchinsky further discloses wherein the one or more programmable processors are configured to receive user input of a request to set the hearing instrument into a group mode, wherein when in the group mode the one or more programmable processors use the second ML model to generate a masked spectrogram where one or more voices selected by a user are enhanced (by selecting more than one speaker, the speech of multiple target speakers is isolated by the voice isolation network; para. 0105, lines 1-12).
As to Claim 16, Lovchinsky remains as applied above to Claim 11. Lovchinsky further discloses wherein the one or more programmable processors are configured to: receive a selection of one or more individuals (a target speaker(s) change is selected; para. 0123, lines 1-9); select embeddings associated with the selected one or more individuals (an updated voice signature is sent to the voice isolation model; para. 0123, lines 9-20); receive second live audio data [620] representing additional sound detected by the one or more microphones [402] of the one or more hearing instruments [400] (an audio signal is received from a hearing device microphone; para. 0100, lines 12-14); generate a second input spectrogram of the second live audio data [620]; generate, using the second ML (model) [602], a second masked spectrogram (a frequency domain output [610] is formed; para. 0108 lines 8-11) based on the selected embeddings and the second input spectrogram (a mask created by the reference audio data is applied to the input audio signal to yield the output [610] which is isolated speech; para. 0107); and cause the one or more (receivers) to output sound based on the second masked spectrogram (the output [610] is recombined to output on a receiver of an ear-worn device; para. 0101).
As to Claim 19, Lovchinsky discloses one or more non-transitory computer-readable media comprising instructions stored thereon that (para. 0018), when executed by one or more processors [1402] (para. 0161), configured to cause the one or more processors to: receive reference audio data [804] representing one or more voices (training data [804] comprising audio clips of a speaker or speakers; para. 0126, lines 1-5); generate, using a first machine learning (ML) model [808], an embedding of the reference audio data (a voice signature [622, 824], is created; para. 0128, lines 1-8); receive live audio data [620] representing sound detected by one or more microphones [402] of one or more hearing instruments [400] (an audio signal is received from a hearing device microphone; para. 0100, lines 12-14); generate an input spectrogram of the live audio data [620] (the live audio data is converted to the frequency domain; para. 0103, lines 9-14); and use a second ML model [602] to generate a masked spectrogram (a frequency domain output [610] is formed; para. 0108 lines 8-11) based on the embedding and the input spectrogram, wherein the masked spectrogram represents a version of the live audio data in which portions of the live audio data spoken in the voices represented by the reference audio data are enhanced (a mask created by the reference audio data is applied to the input audio signal to yield the output [610] which is isolated speech; para. 0107); and cause one or more receivers of the one or more hearing instruments to output sound based on the masked spectrogram (the output [610] is recombined to output on a receiver of an ear-worn device; para. 0101).
As to Claim 20, Lovchinsky remains as applied above to Claim 19. Lovchinsky further discloses that the first ML model [808] is a first neural network (para. 0155) and the second ML model [602] is a second neural network (para. 0105).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 10 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Lovchinsky et al., U.S. Publication No. 2023/0254650, published on August 10, 2023 (Lovchinsky).
As to Claim 10, Lovchinsky remains as applied above to Claim 1. Lovchinsky further discloses, prior to using the second ML model to generate the masked spectrogram: determining, by the processing system, that the one or more hearing instruments are located in a particular location (para. 0089), and selecting the embedding from a plurality of stored embeddings (a target speaker can be selected from a group of people; para. 0123, lines 1-9). Lovchinsky does not explicitly disclose selecting, by the processing system, based on the one or more hearing instruments being located in the particular location, the embedding from among a plurality of stored embeddings. However, it was well known in the art to select a target voice based on a hearing aid user’s location. Dierks discloses a similar method that involves determining that a hearing instrument is located in a particular location, and selecting, based on the hearing instrument being located in a particular location, a target voice from among a plurality of target voices (a particular voice is selected based on the detected location of the user; para. 0016). Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of Applicant’s invention, to incorporate the improvement of Dierks by selecting, by the processing system, based on the one or more hearing instruments being located in the particular location, the embedding from among a plurality of stored embeddings, as a way to select a particular user’s voice in a particular location.
As to Claim 18, Lovchinsky remains as applied above to Claim 11. Lovchinsky further discloses that the one or more programmable processors are configured to, prior to using the second ML model to generate the masked spectrogram based on the embedding and the input spectrogram: determine that the hearing instrument is located in a particular location (para. 0089). Lovchinsky does not explicitly disclose selecting, by the processing system, based on the one or more hearing instruments being located in the particular location, the embedding from among a plurality of stored embeddings. However, it was well known in the art to select a target voice based on a hearing aid user’s location. Dierks discloses a similar method that involves determining that a hearing instrument is located in a particular location, and selecting, based on the hearing instrument being located in a particular location, a target voice from among a plurality of target voices (a particular voice is selected based on the detected location of the user; para. 0016). Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of Applicant’s invention, to incorporate the improvement of Dierks by selecting, by the processing system, based on the one or more hearing instruments being located in the particular location, the embedding from among a plurality of stored embeddings, as a way to select a particular user’s voice in a particular location.
Allowable Subject Matter
Claims 7-9 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is a statement of reasons for the indication of allowable subject matter: Claims 7 and 17 both recite the unique features of receiving from a user, a selection of a type of voice from among one or more types of voices and using a third ML model to generate a second masked spectrogram based on the selection of the type of voice and the second input spectrogram. The closest prior art does not disclose or suggest such features.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ryan Robinson whose telephone number is (571) 270-3956. The examiner can normally be reached on Monday through Friday from 9 am to 5 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Fan Tsang, can be reached on (571) 272-7547. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of an application may be obtained from Patent Center. Status information for published applications may be obtained from Patent Center. Status information for unpublished applications is available through Patent Center for authorized users only. Should you have questions about access to Patent Center, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form.
/RYAN ROBINSON/Primary Examiner, Art Unit 2694