DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/19/2025 has been entered. Claims 1-15 are pending in the application and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The response filed on 12/19/2025 has been correspondingly accepted and considered in this Office Action. Claims 1-15 have been examined. Applicant’s amendments to claim 9, overcome the claim objections previously set forth in the Office Action mailed 09/30/2025.
Response to Arguments
Applicant's arguments filed 12/19/2025 have been fully considered as follows:
Applicant’s arguments with respect to claim 1 (also representative of claim 14) state that
“The Office Action acknowledges that Borgstrom does not teach all of the elements of claim 1, but asserts that the newly-cited Gui reference makes up for these admitted deficiencies: … The assertions regarding Cui are respectfully traversed… It is respectfully submitted that Cui fails to teach several of the recitations of claim 1 that are missing from Borgstrom.”
Applicant’s arguments above with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In response to the art rejection(s) of the remainder of dependent claims are rejected under 35 U.S.C 103, in case said claims are correspondingly discussed and/or argued for at least the same rationale presented in Remarks filed 12/19/2025, Examiner respectfully notes as follows. For completeness, should the mentioned claims be likewise traversed for similar reasons to independent claims 1 and 14 correspondingly, Examiner respectfully directs Applicant to the same previous supra reasons provided in the response directed towards claims 1 and 14 correspondingly discussed above. For at least the same supra provided reasons, Examiner likewise respectfully disagrees, and Applicant's arguments have been fully considered but they are not persuasive.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-3 and 7-15 are rejected under 35 U.S.C. 103 as being unpatentable over Borgstrom et. al. , US PgPub 2021/0074282 in view of Gallart Mauri, J et. al. , US PgPub 2021/0256988.
Regarding claim 1, Borgstrom teaches computer-implemented method of audio processing, the method comprising: generating , by a machine learning model implemented via a central processing unit (CPU), first band gains and a voice activity detection value of an audio signal (see Borgstrom, Fig. 1B, the system 100 using a DNN 120 speech detection classifier (machine learning model); Borgstrom [0069] FIG. 8C shows the speech detection probabilities 830 output from the DNN(machine learning model) 120 (vad) and ( first Gain)); generating a background noise estimate based on the first band gains and the voice activity detection value (see Borgstrom, [0069] The speech detection probabilities 830 are input to the noise estimator 120 along with the spectrogram 410 by a processor system); generating second band gains by processing the audio signal (see Borgstrom, [0069] FIG. 8D shows the resultant estimated noise 840. The resultant estimated noise 840 includes both stationary and non-stationary noise components. ); generating, by the CPU, combined gains by combining the first band gains, generated by the machine learning model and the second band gains(see Borgstrom, [0069] The gain estimator 150 then generates a gain mask 850, as shown in FIG. 8E , FIG. 8C shows the speech detection probabilities 830 output from the DNN(machine learning model) 120 (vad) and ( first Gain));); and generating, by the CPU, a modified audio signal by modifying the audio signal using the combined gains ( see Borgstrom, [0069] for use by the output processor 160 for generating the enhanced spectrogram 899 of FIG. 8F as an output).
However, Borgstrom fails to teach generating, by a Wiener filter implemented via the CPU, a
PNG
media_image1.png
254
255
media_image1.png
Greyscale
background noise estimate; generating, by the Wiener filter, second band gains by processing the audio signal controlled by the background noise estimate. However, Gallart teaches generating, by a Wiener filter implemented via the CPU, a background noise estimate based on the first band gains and the voice activity detection value(see Gallart, [0052, 0054] wiener gain (background noise) for speech/non -speech, describes the computing the estimates of noise SN (t,f)); generating, by the Wiener filter, second band gains by processing the audio signal controlled by the background noise estimate (see Gallart, [0050] the gain of filter 22 is computed based on the speech probability and MMSE estimate); generating, by the CPU, combined gains by combining the first band gains generated by the machine learning model and the second band gains generated by the Wiener filter(see Gallart, [0052], Finally, this filter is responsible for enhancing the spectrum of the speech signal, therefore it is applied to the spectral magnitude 13 which resulted from stage A ); generating, by the CPU, a modified audio signal by modifying the audio signal using the combined gains (see Gallart, Fig. 2 Enhanced Speech).
Borgstrom and Gallart are considered to be analogous to the claimed invention because they relate to enhancing speech signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Borgstrom using a machine learning model to generate the filter gain function to generate enhanced speech using neural networks with the preprocessing teachings of Gallart to reduce the effect of acoustic distortions which occur in daily scenarios during a telephone call. (see Gallart, [0002]).
Regarding claim 2, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom further teaches wherein the machine learning model is generated using data augmentation to increase variety of training data (see Borgstrom, [0019-0020] training data created or augmented by human speech conversations containing noise were created by mixing the speech data with a noise signal created with at least one of a background noise data, a music data, or a non-stationary noise data, silence, reverberant speech).
Regarding claim 3, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom further teaches wherein generating the first band gains includes limiting the first band gains using at least two different limits for at least two different bands (see Borgstrom, [0045] Accordingly, the output from the DNN 120 can be a mask of posterior probabilities (e.g., the statistical probability that a given frequency band of a given frame contains speech) of active speech presence. The noise estimator 130 can estimate nose recursively such that the noise estimator 130 can generate a noise estimate using the output from the DNN 120, track the initial spectrum 110 when speech is not present, and smooth and/or attenuate a previous noise estimate when speech is present(different gains for different bands based on presence of speech)).
Regarding claim 7, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom further teaches multiplying the first band gains and the second band gains (see Borgstrom, [0045] the SNR estimator 140 can then calculate the a posteriori and a priori SNRs for the detected noise for use by the gain estimator 150, which can generate a multiplicative gain mask for use in modifying the initial spectrum 110 to suppress the detected noise without attenuating the detected speech. By applying the mask of posterior probabilities of active speech presence to the initial spectrum 110 before calculating noise estimates, the resulting gain mask more accurately suppresses noise and results in less residual noise in the enhanced spectrum 199); limiting the combined (see Borgstrom, [0052] discusses the tracking the noise and estimating a noise estimating with a smoothing factor to avoid noise filtering artifacts).
Regarding claim 8, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom further teaches wherein generating the modified audio signal includes modifying an amplitude spectrum of the audio signal using the combined (see Borgstrom, [0045], The SNR estimator 140 can then calculate the a posteriori and a priori SNRs for the detected noise for use by the gain estimator 150, which can generate a multiplicative gain mask for use in modifying the initial spectrum 110 to suppress the detected noise without attenuating(modifying the amplitude spectrum) the detected speech. By applying the mask of posterior probabilities of active speech presence to the initial spectrum 110 before calculating noise estimates, the resulting gain mask more accurately suppresses noise and results in less residual noise in the enhanced spectrum 199).
Regarding claim 9, Borgstrom in view Gallart teaches the method of claim 1. Gallart further teaches receiving an input audio signal (see Gallart, Fig. 2, noisy speech); applying an overlapped window to an input audio signal to generate a plurality of frames, wherein the audio signal corresponds to the plurality of frames ( see Gallart, Fig. 2 and [0055] the overlapping and windowing which was used in the temporal segmentation 11 of stage A). The motivation to combine as claim 1 applies here.
Regarding claim 10, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom further teaches performing spectral analysis on the audio signal to generate a plurality of bin features and a fundamental frequency of the audio signal, wherein the first band gains and the voice activity detection value are based on the plurality of bin features and the fundamental frequency (see Borgstrom, [0009-0011, 0013], discusses processing per frequency band specific and frame specific information, ; Borgstrom, [0045, 0050] discusses from a time-domain signal 101, a series of frames of pseudo-stationary segments and then apply the Fourier transform of each segment to generate the local spectra for each frame(to determine bin features and fundamental frequency features). Together, the frames can comprise the initial spectrum 110 to be provided to the DNN 120 and DNN classification for each bin (individual frequency band)).
Regarding claim 11, Borgstrom in view Gallart teaches the method of claim 10. Borgstrom teaches generating a plurality of band features based on the plurality of bin features (see Borgstrom, [0009-0011, 0013], discusses processing per frequency band specific and frame specific information), wherein the first band gains and the voice activity detection value are based on the plurality of band features and the fundamental frequency(see Borgstrom, [0045, 0050] discusses from a time-domain signal 101, a series of frames of pseudo-stationary segments and then apply the Fourier transform of each segment to generate the local spectra for each frame(to determine bin features and fundamental frequency features). Together, the frames can comprise the initial spectrum 110 to be provided to the DNN 120 and DNN classification for each bin (individual frequency band) ). Gallart further teaches wherein the plurality of band features are generated using one of Mel-frequency cepstral coefficients (see Gallart, [0030] With respect to the perceptual representation, two methods are considered: a Mel scale filterbank and a representation based on Mel-frequency cepstral coefficients (MFCC).). The same motivation to combine in claim 1 is used here.
Regarding claim 12, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom further teaches wherein the combined gains are combined band gains that are associated with a plurality of bands of the audio signal(see Borgstrom, [0017] discusses computing gain masks for different bands of audio signal ), the method further comprising: converting the combined band gains to combined bin gains, wherein the combined bin gains are associated with a plurality of bins (see Borgstrom, [0017) the noise estimator can be configured to calculate SNRs on a per-frame and per-frequency band basis, and the gain estimator can be configured to receive the initial spectrum, the noise variance estimate, and the SNRs.).
Regarding claim 13, is directed to a non-transitory computer readable medium claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 14, is directed to an apparatus claim corresponding to the method claim presented in claim 1 and is rejected under the same grounds stated above regarding claim 1.
Regarding claim 15, Borgstrom in view Gallart teaches the apparatus of claim 14. Borgstrom further teaches wherein at least one limit is applied when generating at least one of the first band gains and the second band gains (see Borgstrom, [0013]the system can include a filter configured to apply a passband (limit) to the input signal. The passband can have a frequency range that corresponds to human speech. In at least some such instances, the filter can be configured to apply cepstral mean subtraction on at least one cepstral coefficient of the input signal)
Claim 4-5 is rejected under 35 U.S.C. 103 as being unpatentable over are rejected under 35 U.S.C. 103 as being unpatentable over Borgstrom et. al. , US PgPub 2021/0074282 in view of Gallart Mauri, J et. al. , US PgPub 2021/0256988 further in view of A. Biswas, et. al., "Acoustic feature extraction using ERB like wavelet sub-band perceptual Wiener filtering for noisy speech recognition," 2014 Annual IEEE India Conference (INDICON), Pune, India, 2014, pp. 1-6
Regarding claim 4, Borgstrom in view of Gallart teaches the method of claim 1. Borgstrom in view of Gallart fails to teach wherein generating the background noise estimate is based on a number of noise frames exceeding a threshold for a particular band.
However, Biswas further teaches wherein generating the background noise estimate is based on a number of noise frames exceeding a threshold for a particular band (see Biswas, sect II, Then for each decomposed sub-band auditory masking threshold is calculated. This threshold is used as frequency gain function of the wiener filter to enhanced the noisy speech spectrum ).
Borgstrom, Gallart and Biswas are considered to be analogous to the claimed invention because they relate to enhancing speech signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Borgstrom in view of Gallart using a machine learning model to generate the filter gain function to generate enhanced speech using neural networks with the Noise estimation teachings of Biswas to enhance the noisy speech signal (see Biswas, sect I).
Regarding claim 5, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom in view Gallart fail to teach wherein generating the second band gains includes using the Wiener filter based on a stationary noise level of a particular band. However, Biswas further teaches wherein generating the second band gains includes using the Wiener filter based on a stationary noise level of a particular band (see Biswas, sect II, IIC : Non-uniform WP sub-bands decomposition of the noisy speech v(t)=s(t)+n(t)) is carried out using auditory ERB scale [4], [12], where s(t) is the clean signal, n(t) is the additive noise and t=0,1,…,N−1 is the time index. Auditory masking threshold is calculated by processing each wavelet sub-bands.. Fast Fourier transform (FFT) is applied to the output of the each WP decomposed subband output yi,k(t) to obtain noisy speech spectrum Yi,k(v). Then Wiener filter Wi,k(v) (of the particular band) is then applied to each sub-band to produce enhanced power spectrum S~i,k(v).). The same motivation to combine as claim 4 applies here.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over are rejected under 35 U.S.C. 103 as being unpatentable over Borgstrom et. al. , US PgPub 2021/0074282 in view of Gallart Mauri, J et. al. , US PgPub 2021/0256988 further in view of Every et. al. US PgPub. 2011/0257967.
Regarding claim 6, Borgstrom in view Gallart teaches the method of claim 1. Borgstrom in view Gallart fails to teach wherein generating the second band gains includes limiting the second band gains using at least two different limits for at least two different bands. However, Every teaches wherein generating the second band gains includes limiting the second band gains using at least two different limits for at least two different bands (see Every, [0089] the wiener gain values from the wiener filter module 400 are also provided to the optional mask smoother module 402. The mask smoother module 402 performs temporal smoothing of the wiener gain values, which helps to reduce the musical noise. The wiener gain values may change quickly (e.g. from one frame to the next) and speech and noise estimates can vary greatly between each frame. Thus, the use of the wiener gain values, as is, may result in artifacts (e.g. discontinuities, blips, transients, etc.). Therefore, optional filter smoothing may be performed in the mask smoother module 402 to temporally smooth the Wiener gain value)).
Borgstrom, Gallart and Every are considered to be analogous to the claimed invention because they relate to enhancing speech signals. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Borgstrom in view of Gallart using a machine learning model to generate the filter gain function to generate enhanced speech using neural networks with the Wiener gain optimization teachings of Every to improve user’s perception(see Every, [0007]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lou, US PgPub. 2016/0240210 teaches a methods for enhancing speech signal intelligibility and for bettering performance of automatic speech recognition processes, for a speech signal in a noisy environment (see Lou, Abstract).
Z. Cui and C. Bao, "Linear Prediction-based Part-defined Auto-encoder Used for Speech Enhancement," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 6880-6884, teaches the linear prediction (LP) model-based PAE is used to predict the AR-Wiener filter parameters, in which the line spectral frequency (LSF) parameters, AR gains are used for middle feature spaces target, and Wiener mask are used for final target. Furthermore, a self-defined the LP layer that can transfer the LP parameters into magnitude spectrum is used to drive the AR-Wiener filter for obtaining magnitude spectrum of clean speech(see Cui, section I) .
Abd El-Fattah, Marwa A., et al. "Speech enhancement with an adaptive Wiener filter." International Journal of Speech Technology 17.1 (2014): 53-64 teaches an adaptive Wiener filtering method for speech enhancement (see Abd, sect. 1).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NANDINI SUBRAMANI whose telephone number is (571)272-3916. The examiner can normally be reached Monday - Friday 12:00pm - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh M Mehta can be reached at (571)272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NANDINI SUBRAMANI/ Examiner, Art Unit 2656
/BHAVESH M MEHTA/ Supervisory Patent Examiner, Art Unit 2656