Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 2/4/2026 has been entered.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 6, 13-19,21-24, 26 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (20210142148) in view of Zhang et al (20210168554).
As per claim 1, Wang et al (20210142148) teaches a method for end-to-end speech enhancement based on a neural network (as, using neural networks to perform end-to-end, para 0003, and para 0024) comprising:
obtaining, by a server or a terminal device (para 0020 – tablet computer/server),
a time-domain smoothing (as, global averaging pooling operation – para 0043-0044) feature (as compressing features – para 0043) of an original speech signal by performing feature extraction on the original speech signal using a time-domain convolution kernel (as using a time-frequency structure – para 0026; and a convolution kernel – para 0031);
and obtaining, by the server or the terminal device, an enhanced speech signal by performing combined feature extraction on the original speech signal and the time-domain smoothing feature of the original speech signal (as, re-generating an output signal – figure 7, by taking a multilayer perceptron, and combining – fig. 7, subblock s263, s265-s267; with a potential application being sound source separation – para 0045, last 2 sentences).
As per claim 1, Wang et al (20210142148) teaches incorporating a time-domain smoothing algorithm into a deep neural network as a one-dimensional convolution module (as, one dimensional neural network signal separator – para 0048), with a time domain smoothing feature (as operating on input parameters – para 0007, para 0025, which operates on well known parameters of STFT); however, Wang et al (20210142148) does not explicitly teach noise smoothing as part of an explicit defined feature (Wang et al (20210142148) teaches operating in the time domain on ‘well known features’, but does not explicitly define noise smoothing); Zhang et al (20210168554) teaches the use of time-recursive average noise estimation, to reduce the noise in sound signals – para 0204. Therefore, it would have been obvious to one of ordinary skill in the art of sound separation/extraction to modify the sound processing of Wang et al (20210142148) with additional noise averaging/estimation, as taught by Zhang et al (20210168554), because it would advantageously reduce the noise artifacts in the recorded sound (see para 0204, last half).
As per claim 2, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 1, wherein obtaining a time-domain smoothing feature of an original speech signal by performing feature extraction on the original speech signal using a time-domain convolution kernel comprises:
determining a time-domain smoothing parameter matrix according to a convolution sliding window and a time-domain smoothing factor (as, Wang et al (20210142148), interpolating using nearest neighbor, in the time domain – pp3, second column, lines 5-10):
obtaining a weight matrix of the time-domain convolution kernel by performing a product operation on the time-domain smoothing parameter matrix (as, Wang et al (20210142148), weight matrix with product calculations – lines 17-40 – see interpolation referrals, and multiplication/add of the Weights);
and obtaining the time-domain smoothing feature of the original speech signal by performing a convolution operation on the weight matrix of the time-domain convolution kernel and the original speech signal (as, Wang et al (20210142148), using the weights of the convolution kernel – par 0031; and pg 3, second column, after equation (1), the explanation of the Weights, and after equation 3).
As per claim 3, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 2, wherein determining a time-domain smoothing parameter matrix according to a convolution sliding window and a time-domain smoothing factor comprises:
initializing a plurality of time-domain smoothing factors; and obtaining the time-domain smoothing parameter matrix based on a preset convolution sliding window and the plurality of time-domain smoothing factors (as, Wang et al (20210142148), fixed convolution functions – para 0042, operating/ calculating pooling functions – para 0043-0044; and interpolations as well – para 0034).
As per claim 4, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 1, wherein obtaining an enhanced speech signal by performing combined feature extraction on the original speech signal and the time-domain smoothing feature of the original speech signal comprises:
obtaining a speech signal to be enhanced by combining the original speech signal and the time-domain smoothing feature of the original speech signal (as, Wang et al (20210142148), re-generating an output signal – figure 7, by taking a multilayer perceptron, and combining – fig. 7, subblock s263, s265-s267; with a potential application being sound source separation – para 0045, last 2 sentences);
training a weight matrix of the time-domain convolution kernel by using a back propagation algorithm with the speech signal to be enhanced as an input of a deep neural network (as, Wang et al (20210142148), projecting features back to a time-domain, using weights from the kernel – para 0042), and;
obtaining the enhanced speech signal by performing combined feature extraction on the speech signal to be enhanced according to the weight matrix obtained by training (as, Wang et al (20210142148), Generating the enhanced speech signal after features extraction and weighting – as, Wang et al (20210142148), see Figure 7, obtaining a mask, filtering, using a new convolution kernel with input from the neural network/multilayer perceptron, with an output signal).
As per claim 6, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 4, wherein obtaining the enhanced speech signal by performing combined feature extraction on the speech signal to be enhanced according to the weight matrix obtained by training comprises:
obtaining a first time-domain feature map by performing a convolution operation on the weight matrix obtained by training and an original speech signal in the speech signal to be enhanced (as, Wang et al (20210142148), using the Wave-U-Net – analyzing time -frequency domain, and convolutional neural network – para 0026, and para 0042);
obtaining a second time-domain feature map by performing a convolution operation on the weight matrix obtained by training and a smoothing feature in the speech signal to be enhanced (as, Wang et al (20210142148), weight matrix with product calculations – lines 17-40 – see interpolation referrals, and multiplication/add of the Weights); and obtaining the enhanced speech signal by combining the first time-domain feature map and the second time-domain feature map (as combining the high-level and low-level calculations into a single decoded signal as, Wang et al (20210142148), – para 0034, pp 3, second column, after equation #3).
As per claim 15, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 1, wherein obtaining a time-domain smoothing feature of an original speech signal by performing feature extraction on the original speech signal using a time-domain convolution kernel comprises:
performing speech enhancement on phase information and amplitude information in the original speech signal by inputting the original speech signal into a deep neural network for time-varying feature extraction (as, Wang et al (20210142148), operating on input parameters – para 0007, para 0025, which operates on well known parameters of STFT – short time fourier transform, which by definition, has phase and amplitude information – see also para 0003).
As per claim 16, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 1, wherein the original speech signal is represented by a one-dimensional vector (as, Wang et al (20210142148), one dimensional neural network signal separator – para 0048).
Claims 13,17,18,19,21 are computer readable medium storage claims, whose steps are performed by method claims 1-4,6,15,16, and as such, claims 13,17-19,21 are similar in scope and content to claims 1-4,6,15,16; therefore, claims 13,17-19,21 are rejected under similar rationale as presented against claims 1-4,6,15,16 above. Furthermore, Wang et al (20210142148) teaches storage mediums – para 0024, storing the instructional steps.
Claims 14,22-24,26 are electronic device claims performing the steps found in method claims 1-4,6,15,16, and as such, claims 14,22-24,26 are similar in scope and content to claims 1-4,6,15,16; therefore, claims 14,22-24,26 are rejected under similar rationale as presented against claims 1-4,6,15,16 above. Furthermore, Wang et al (20210142148) teaches a processor accessing memory and executing the stored instructions – para 0007.
Claim(s) 5, 20, 25 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al (20210142148) in view of Zhang et al (20210168554), in further view of Arik et al (20190355347).
As per claims 5,20, 25, the combination of Wang et al (20210142148) in view of Zhang et al (20210168554) teaches the method for end-to-end speech enhancement according to claim 4, wherein training a weight matrix of the time-domain convolution kernel by using a back propagation algorithm with the speech signal to be enhanced as an input of a deep neural network comprises (as mapped above, in claims 1,4):
inputting the speech signal to be enhanced into the deep neural network ( Wang et al (20210142148) para 0003); but does not explicitly teach:
“and constructing a time-domain loss function; and training the weight matrix of the time-domain convolution kernel by using an error back propagation algorithm according to the time-domain loss function”; (Wang casually mentions loss in para 0030); Arik et al (20190355347) explicitly teaches the use/calculation of loss functions during the execution of the short time fourier transform (see para 0070-0075), as well as envelope loss during the convolution operators – para 0076 – 0080). Therefore, it would have been obvious to one of ordinary skill in the art of convolutional processing of audio signals to further detail the STFT processing in Wang et al (20210142148) in view of Zhang et al (20210168554) with loss functions for the STFT parameters, as well as the loss function tied to the convolutional calculations, as taught by Arik et al (20190355347), because it would advantageously quantify “how close”, or the “quality” of the measurement, compared to ground-truth information -- Arik et al (20190355347), para 0078).
Response to Arguments
Applicants amendments, to the abstract, has overcome that rejection; and that rejection has been removed. Applicant's arguments filed 02/04/2026 have been fully considered but are moot in view of the new grounds of rejection. Examiner notes the introduction of the Zhang et al (20210168554) reference to address the new claim limitations toward the noise averaging to reduce noise in the sound signal.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Please see related art listed on the PTO-892 form.
Furthermore, the following references were found, that teach claim/specification features:
Cantanzaro et al (20170148431) teaches end-to-end speech recognition uses convolutional kernels in a ctc-rnn models – para 0043-0045
Mesgarani et al (20190066713) teaches speech separation using neural networks (para 0079, 0080) processing time frequency bins – para 0081-0082, and smoothing/cleaning the speech signal via spectrogram processing – para 0087).
Tashev et al (20190318755) teaches end-to-end models (para 0035) operating on spectrograms in both time-frequency domains (para 0035), using neural networks processing convolutional kernels (para 0036).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michael Opsasnick, telephone number (571)272-7623, who is available Monday-Friday, 9am-5pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Mr. Richemond Dorvil, can be reached at (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/Michael N Opsasnick/Primary Examiner, Art Unit 2658 02/18/2026