Last updated: April 19, 2026
Application No. 18/532,893
System and Method for Disentangling Audio Signal Information

Final Rejection §103§112
Filed
Dec 07, 2023
Examiner
BLANKENAGEL, BRYAN S
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +35.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 377 resolved cases, 2023–2026
Examiner Intelligence

BLANKENAGEL, BRYAN S View full profile →
Grants 67% — above average
Career Allow Rate
254 granted / 377 resolved
+5.4% vs TC avg
Strong +35% interview lift
Without
With
+35.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
400
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
49.3%
+9.3% vs TC avg
§102
13.3%
-26.7% vs TC avg
§112
6.5%
-33.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 377 resolved cases
Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 8-11, and 20-34 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant's arguments filed 01/16/2026 have been fully considered but they are not persuasive. Regarding arguments on pages 13-14 of the Remarks, Examiner notes that Tai teaches two parallel paths of processing the input, including a purifying encoder path and an eliminating encoder path, interpreted as the NISA and SPK neural networks.	

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 8-11, and 20-34 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. para [0018] of Applicant's Specification refers to "Background acoustic metrics 216 and Speaker acoustic metrics 220 are utilized as an adversarial loss factor 222 in the training of the system." However, this appears to teach a single adversarial loss factor, while the claims now include two separate adversarial loss factors. Therefore, the second adversarial loss factor is not supported by the Specification, and is considered new matter. The dependent claims further incorporate the deficiencies of the independent claims.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 8-11, and 20-34 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. The independent claims teach “processing, … the background acoustics embedding based on a first adversarial loss factor to obtain a background acoustics embedding” and then “wherein the first adversarial loss factor is based on background acoustic metrics derived from the processed background acoustics embedding,” as well as similar limitations referring to the second adversarial loss factor and speaker acoustic metrics. It is unclear how the adversarial loss factors can be based on metrics derived from the processed embeddings, when the adversarial loss factors are used to obtain said processed embeddings. The dependent claims further incorporate the deficiencies of the independent claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 8-11, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tai et al. (Tai, J., Jia, X., Huang, Q., Zhang, W., Du, H., & Zhang, S. (2020, December). SEEF-ALDR: A speaker embedding enhancement framework via adversarial learning based disentangled representation. In Proceedings of the 36th Annual Computer Security Applications Conference (pp. 939-950).), hereinafter referred to as Tai, in view of Chen et al. (US 2021/0304628 A1), hereinafter referred to as Chen.

Regarding claim 1, Tai teaches:
A method comprising:
receiving a speech signal recorded in an acoustic environment (Page 942 Section 3.1 first paragraph, where an input spectrogram of speech is used); 
extracting background information from the speech signal to generate a background acoustics embedding (Page 942 Section 3.1 first paragraph, where a speaker eliminating encoder is used to extract the identity-unrelated feature or embedding); 
extracting speaker information from the speech signal to generate a speaker acoustics embedding (Page 942 Section 3.1 first paragraph, where a speaker purifying encoder is used to extract the identity-purified feature or embedding); 
iteratively training a speech processing system to process the background acoustics embedding and the speaker acoustics embedding in parallel, wherein the speech processing system includes a Non-Intrusive Speech Assessment (NISA) neural network and an SPK neural network (Fig. 2, Page 942 Section 3.1 third paragraph, where the training process is iterated on the twin networks or encoders that operate in parallel), and wherein iteratively training the speech processing system comprises:
processing, by the NISA neural network, the background acoustics embedding based on a first adversarial loss factor to obtain a processed background acoustics embedding, wherein processing the background acoustics embedding based on the first adversarial loss factor reduces an amount of residual speaker information in the processed background acoustics embedding, and wherein the first adversarial loss factor is based on background acoustic metrics derived from the processed background acoustics embedding (Page 944 Section 3.3, where the eliminating encoder is trained using a cross-entropy loss to fool an adversarial classifier, to work towards the identity-unrelated feature, and where the loss is calculated using the predicted distribution, which is a function of the identity-unrelated feature); 
wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding (page 943 section 3.2, where the loss is calculated using the speaker identity distribution);
Tai does not teach:
processing, by the SPK neural network, the speaker acoustics embedding based on a second adversarial loss factor to obtain a processed speaker acoustics embedding, wherein processing the speaker acoustics embedding based on the second adversarial loss factor reduces an amount of residual background information in the processed speaker acoustics embedding, and wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding; and
using the trained speech processing system to perform speech recognition on speech signals recorded in the acoustic environment.
Chen teaches:
processing, by the SPK neural network, the speaker acoustics embedding based on a second adversarial loss factor to obtain a processed speaker acoustics embedding, wherein processing the speaker acoustics embedding based on the second adversarial loss factor reduces an amount of residual background information in the processed speaker acoustics embedding, and wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding (para [0019], where noise is removed from the speech signal using a generative adversarial network); and 
using the trained speech processing system to perform speech recognition on speech signals recorded in the acoustic environment (para [0026], where the cleaned speech is transcribed).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai by using the GAN-based speech enhancement of Chen (Chen para [0019]) in the speaker classification of Tai (Tai page 943, section 3.2), in order to produce a cleaned audio track using machine learning (Chen para [0019]).

Regarding claim 8, Tai in view of Chen teaches:
The method of claim 1, further including combining features of the background information and the speaker information prior to generating the background acoustics embedding and the speaker acoustics embedding (Tai Page 944 section 3.4, where the features are concatenated, and Page 942 section 3.1 last paragraph, where the training process is iterated, so the features of one iteration are concatenated before the embeddings of a later iteration are generated).  

Regarding claim 9, Tai in view of Chen teaches:
The method of claim 1, wherein the processed background acoustics embedding and the processed speaker acoustics embedding are used to train the speech processing system (Tai Page 942 Section 3.1 last paragraph, where the training process is iterated for use in speaker verification).  

Regarding claim 10, Tai in view of Chen teaches:
The method of claim 1, further including applying a clustering constraint in the generation of at least one of the background acoustics embedding and the speaker acoustics embedding (Tai Figs. 3-4, Page 947 Section 4.3, where clustering is used to determine clear classification boundaries among features of each identity).  

Regarding claim 11, Tai teaches:
A system comprising:
at least one processor (Page 945 "Initialization", where a processor is used); and 
a memory (Page 945 "Initialization", where memory is used) storing programming instructions for the at least one processor, the programming instructions, upon execution by the at least one processor, causing the system to perform the following operations: 
receiving a speech signal recorded in an acoustic environment (Page 942 Section 3.1 first paragraph, where an input spectrogram of speech is used); 
extracting background information from the speech signal to generate a background acoustics embedding (Page 942 Section 3.1 first paragraph, where a speaker eliminating encoder is used to extract the identity-unrelated feature or embedding); 
extracting speaker information from the speech signal to generate a speaker acoustics embedding (Page 942 Section 3.1 first paragraph, where a speaker purifying encoder is used to extract the identity-purified feature or embedding); 
iteratively training a speech processing system to process the background acoustics embedding and the speaker acoustics embedding in parallel, wherein the speech processing system includes a Non-Intrusive Speech Assessment (NISA) neural network and an SPK neural network (Fig. 2, Page 942 Section 3.1 third paragraph, where the training process is iterated on the twin networks or encoders that operate in parallel), and wherein iteratively training the speech processing system comprises:
processing, by the NISA neural network, the background acoustics embedding based on a first adversarial loss factor to obtain a processed background acoustics embedding, wherein processing the background acoustics embedding based on the first adversarial loss factor reduces an amount of residual speaker information in the processed background acoustics embedding, and wherein the first adversarial loss factor is based on background acoustic metrics derived from the processed background acoustics embedding (Page 944 Section 3.3, where the eliminating encoder is trained using a cross-entropy loss to fool an adversarial classifier, to work towards the identity-unrelated feature, and where the loss is calculated using the predicted distribution, which is a function of the identity-unrelated feature); 
wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding (page 943 section 3.2, where the loss is calculated using the speaker identity distribution);
Tai does not teach:
processing, by the SPK neural network, the speaker acoustics embedding based on a second adversarial loss factor to obtain a processed speaker acoustics embedding, wherein processing the speaker acoustics embedding based on the second adversarial loss factor reduces an amount of residual background information in the processed speaker acoustics embedding, and wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding; and
using the trained speech processing system to perform speech recognition on speech signals recorded in the acoustic environment.
Chen teaches:
processing, by the SPK neural network, the speaker acoustics embedding based on a second adversarial loss factor to obtain a processed speaker acoustics embedding, wherein processing the speaker acoustics embedding based on the second adversarial loss factor reduces an amount of residual background information in the processed speaker acoustics embedding, and wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding (para [0019], where noise is removed from the speech signal using a generative adversarial network); and 
using the trained speech processing system to perform speech recognition on speech signals recorded in the acoustic environment (para [0026], where the cleaned speech is transcribed).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai by using the GAN-based speech enhancement of Chen (Chen para [0019]) in the speaker classification of Tai (Tai page 943, section 3.2), in order to produce a cleaned audio track using machine learning (Chen para [0019]).

Regarding claim 20, Tai teaches:
A computer program product residing on a non-transitory computer readable medium having programming instructions stored thereon which, when executed by a processor of a system, cause the system to perform the following operations (Page 945 "Initialization", where memory is used):
receiving a speech signal recorded in an acoustic environment (Page 942 Section 3.1 first paragraph, where an input spectrogram of speech is used); 
extracting background information from the speech signal to generate a background acoustics embedding (Page 942 Section 3.1 first paragraph, where a speaker eliminating encoder is used to extract the identity-unrelated feature or embedding); 
extracting speaker information from the speech signal to generate a speaker acoustics embedding (Page 942 Section 3.1 first paragraph, where a speaker purifying encoder is used to extract the identity-purified feature or embedding); 
iteratively training a speech processing system to process the background acoustics embedding and the speaker acoustics embedding in parallel, wherein the speech processing system includes a Non-Intrusive Speech Assessment (NISA) neural network and an SPK neural network (Fig. 2, Page 942 Section 3.1 third paragraph, where the training process is iterated on the twin networks or encoders that operate in parallel), and wherein iteratively training the speech processing system comprises:
processing, by the NISA neural network, the background acoustics embedding based on a first adversarial loss factor to obtain a processed background acoustics embedding, wherein processing the background acoustics embedding based on the first adversarial loss factor reduces an amount of residual speaker information in the processed background acoustics embedding, and wherein the first adversarial loss factor is based on background acoustic metrics derived from the processed background acoustics embedding (Page 944 Section 3.3, where the eliminating encoder is trained using a cross-entropy loss to fool an adversarial classifier, to work towards the identity-unrelated feature, and where the loss is calculated using the predicted distribution, which is a function of the identity-unrelated feature); 
wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding (page 943 section 3.2, where the loss is calculated using the speaker identity distribution);
Tai does not teach:
processing, by the SPK neural network, the speaker acoustics embedding based on a second adversarial loss factor to obtain a processed speaker acoustics embedding, wherein processing the speaker acoustics embedding based on the second adversarial loss factor reduces an amount of residual background information in the processed speaker acoustics embedding, and wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding; and
using the trained speech processing system to perform speech recognition on speech signals recorded in the acoustic environment.
Chen teaches:
processing, by the SPK neural network, the speaker acoustics embedding based on a second adversarial loss factor to obtain a processed speaker acoustics embedding, wherein processing the speaker acoustics embedding based on the second adversarial loss factor reduces an amount of residual background information in the processed speaker acoustics embedding, and wherein the second adversarial loss factor is based on speaker acoustics metrics derived from the processed speaker acoustics embedding (para [0019], where noise is removed from the speech signal using a generative adversarial network); and 
using the trained speech processing system to perform speech recognition on speech signals recorded in the acoustic environment (para [0026], where the cleaned speech is transcribed).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai by using the GAN-based speech enhancement of Chen (Chen para [0019]) in the speaker classification of Tai (Tai page 943, section 3.2), in order to produce a cleaned audio track using machine learning (Chen para [0019]).

Claim(s) 21-27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tai, in view of Chen, and further in view of Sharma et al. (US 2024/0005908 A1), hereinafter referred to as Sharma.

Regarding claim 21, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the background acoustic metrics include reverberation parameters associated with the acoustic environment.  
Sharma teaches:
wherein the background acoustic metrics include reverberation parameters associated with the acoustic environment (para [0032], where the reverberation parameters include C50, T60, and DRR).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the parameters of Sharma (Sharma para [0032]) as the background acoustic metrics of Tai in view of Chen (Tai page 944 section 3.3), in order to provide acoustic environmental profile estimation for accurate speech recognition to compensate for the environment (Sharma para [0020]).

Regarding claim 22, Tai in view of Chen and Sharma teaches:
The method of claim 21, wherein the reverberation parameters include at least one of C50, T60, DRR, or C5 parameters (Sharma para [0032], where the parameters include C50, T60, and DRR).  

Regarding claim 23, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the background acoustic metrics indicate a noise type and a signal-to-noise ratio (SNR).  
Sharma teaches:
wherein the background acoustic metrics indicate a noise type and a signal-to-noise ratio (SNR) (para [0032], where the parameters include noise type and SNR).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the parameters of Sharma (Sharma para [0032]) as the background acoustic metrics of Tai in view of Chen (Tai page 944 section 3.3), in order to provide acoustic environmental profile estimation for accurate speech recognition to compensate for the environment (Sharma para [0020]).

Regarding claim 24, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the background acoustic metrics include voice activity detection parameters or voice overlap detection parameters.  
Sharma teaches:
wherein the background acoustic metrics include voice activity detection parameters or voice overlap detection parameters (para [0032], [0035], where voice activity detection parameters are used).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the parameters of Sharma (Sharma para [0032]) as the background acoustic metrics of Tai in view of Chen (Tai page 944 section 3.3), in order to provide acoustic environmental profile estimation for accurate speech recognition to compensate for the environment (Sharma para [0020]).

Regarding claim 25, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the background acoustic metrics include signal quality parameters comprising at least one of an PESQ parameter or an STOI parameter.  
Sharma teaches:
wherein the background acoustic metrics include signal quality parameters comprising at least one of an PESQ parameter or an STOI parameter (para [0032], where the parameters include STOI or PESQ parameters).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the parameters of Sharma (Sharma para [0032]) as the background acoustic metrics of Tai in view of Chen (Tai page 944 section 3.3), in order to provide acoustic environmental profile estimation for accurate speech recognition to compensate for the environment (Sharma para [0020]).

Regarding claim 26, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the background acoustic metrics indicate a CODEC type.  
Sharma teaches:
wherein the background acoustic metrics indicate a CODEC type (para [0031-32], where the parameters include codec information, such as the type).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the parameters of Sharma (Sharma para [0032]) as the background acoustic metrics of Tai in view of Chen (Tai page 944 section 3.3), in order to provide acoustic environmental profile estimation for accurate speech recognition to compensate for the environment (Sharma para [0020]).

Regarding claim 27, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the background acoustic metrics indicate a CODEC bit rate.  
Sharma teaches:
wherein the background acoustic metrics indicate a CODEC bit rate (para [0031-32], where the parameters include a codec bit rate).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the parameters of Sharma (Sharma para [0032]) as the background acoustic metrics of Tai in view of Chen (Tai page 944 section 3.3), in order to provide acoustic environmental profile estimation for accurate speech recognition to compensate for the environment (Sharma para [0020]).

Claim(s) 28-34 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tai, in view of Chen, and further in view of Edwards (US 2019/0311721 A1).

Regarding claim 28, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate a pitch of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate a pitch of a speaker (para [0043], where pitch is used as a speaker characteristic).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Regarding claim 29, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate a pitch variation of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate a pitch variation of a speaker (para [0043], where pitch is used as a speaker characteristic, and where the pitch varies as the speaker speaks).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Regarding claim 30, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate a vocal tract length of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate a vocal tract length of a speaker (para [0043], [0045], where vocal anatomy affects the speaker characteristic).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Regarding claim 31, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate a gender of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate a gender of a speaker (para [0043], where gender affects the speaker characteristic).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Regarding claim 32, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate an accent of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate an accent of a speaker (para [0043], where accent affects the speaker characteristic).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Regarding claim 33, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate a language of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate a language of a speaker (para [0043], where language affects the speaker characteristic).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Regarding claim 34, Tai in view of Chen teaches:
The method of claim 1,
Tai in view of Chen does not teach:
wherein the speaker acoustic metrics indicate an age of a speaker.  
Edwards teaches:
wherein the speaker acoustic metrics indicate an age of a speaker (para [0043], where age affects the speaker characteristic).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Tai in view of Chen by using the speaker characteristics of Edwards (Edwards para [0043]) as the speaker acoustic metrics of Tai in view of Chen (Tai page 943 section 3.2), in order to use speaker information that can be updated over time (Edwards para [0044]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2021/0343305 A1 para [0066] teaches removing noise in an adversarial manner; US 2022/0206130 A1 para [0025] teaches using a GAN for speech enhancement and audio denoising; US 2024/0031765 A1 para [0053] teaches noise suppression using a GAN for enhancing audio.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Dec 07, 2023
Application Filed
Oct 29, 2025
Non-Final Rejection — §103, §112
Nov 18, 2025
Applicant Interview (Telephonic)
Nov 18, 2025
Examiner Interview Summary
Jan 16, 2026
Response Filed
Feb 27, 2026
Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/401,768
Patent 12602551
GENERATION OF SYNTHETIC DOCUMENTS FOR DATA AUGMENTATION
2y 5m to grant Granted Apr 14, 2026
17/850,617
Patent 12579993
Multi-Talker Audio Stream Separation, Transcription and Diaraization
2y 5m to grant Granted Mar 17, 2026
18/014,217
Patent 12572759
MULTILINGUAL CONVERSATION TOOL
2y 5m to grant Granted Mar 10, 2026
18/251,876
Patent 12555591
MACHINE LEARNING ASSISTED SPATIAL NOISE ESTIMATION AND SUPPRESSION
2y 5m to grant Granted Feb 17, 2026
18/066,128
Patent 12547836
KNOWLEDGE FACT RETRIEVAL THROUGH NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+35.2%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 377 resolved cases by this examiner. Grant probability derived from career allow rate.