DETAILED ACTION
Introduction
1. This office action is in response to Applicant’s submission filed on 10/6/2025. Claims 1-40 are pending in the application and have been examined.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
3. The information disclosure statement (IDS) submitted on 11/17/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
4. The Amendment filed 10/6/2025 has been entered and fully considered. With respect to the statement under 35 USC 112(f), Applicant acknowledges that the claim elements are interpreted under 35 USC 112(f). With regard to the rejection under 35 USC 112, that rejection is withdrawn based on the filed Amendment. With regard to the rejection under 35 USC 101, that rejection is withdrawn based on the filed Amendment.
With regard to the rejections under 35 USC 103, the arguments provided have been fully considered, but are not persuasive. The Amendment argues that the present invention acts only on an input noise signal, while the cited art acts on a signal that is part noise. However, the claims as recited simply state “process a noise signal, or a signal derived from the noise signal.” Thus, the arguments presented are narrower than the claim language. In particular, the argument presented is that the present invention acts only on a noise signal. The pending claims are not so narrow, and the art processing a mixed signal including noise renders obvious the pending claim language of “process a noise signal, or a signal derived from the noise signal.”
Claim Rejections - 35 USC § 103
5. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. Claims 1-7, 14-28, and 35-40 are rejected under 35 U.S.C. 103 as unpatentable over “CAN WE TRUST DEEP SPEECH PRIOR?” (Shi et al., hereinafter “Shi” , cited on IDS of 6/12/2023) in view of “SQUEEZEWAVE: EXTREMELY LIGHTWEIGHT VOCODERS FOR ON-DEVICESPEECH SYNTHESIS” (Zhai et al., hereinafter “Zhai”, cited on IDS of 6/12/2023).
With regard to Claim 1, Shi describes:
An apparatus for providing a processed audio signal on the basis of an input audio signal, the apparatus comprising:
one or more flow blocks, (Section 4.2 shows a series of equations that can be structed as flow blocks.)
wherein the apparatus is configured to process a noise signal, or a signal derived from the noise signal, using the one or more flow blocks in order to acquire the processed audio signal, (Section 4.2 of Shi describes that sound including noise is input.)
In theory, the flow model can be used to model the prior for both the clean speech and the noise signal, and the inference based on Eq.(4) is theoretically optimal if the prior models produce accurate likelihood for clean speech and noise. However, as we have discussed, the likelihood produced by deep generative models, including the WaveGlow model we used, is suspicious. To simplify the investigation, we use the flow model to represent the clean speech only, and constrain the noise to be Gaussian. Therefore, the ML-based inference can be conducted by optimizing the following objective with respect to xt:
log p(xt|yt) ∝ log p(xt) + log p(nt)
= log N (zt; 0, I) + log | det
∂f −1
∂xt
+ log N (yt − xt; 0, σI) (8)
where zt = f −1(xt) and f −1 is the flow model. σ is the variance of the noise. Since all the terms in the above objective can be computed easily, the optimization can be easily conducted by GD. Note that the Gaussian noise is just the white
wherein the apparatus is configured to adapt a processing performed using the one or more flow blocks in dependence on the input audio signal [[and using the neural network,]] (Section 4.1 of Shi describes that the processing is performed using flow blocks.)
PNG
media_image1.png
329
511
media_image1.png
Greyscale
Shi does not explicitly describe using a neural network or “wherein the apparatus is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.”
However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
Further, Section 4 of Zhai describes that the algorithm described can be run on computing devices.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the computerized neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 2, Shi describes “the input audio signal is represented by a set of time domain audio samples.” Section 4.1 of Shi describes that “The input is a window of 80 speech samples in the time domain.”
With respect to Claim 3, Shi describes “a [[neural]] network associated with a given flow block of the one or more flow blocks is configured to determine one or more processing parameters for the given flow block in dependence on the noise signal, or a signal derived from the noise signal, and in dependence on the input audio signal.” Section 2.1, equations 1-4 describe that the parameters stx are dependent on the input audio and noise signals.
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 4, Shi describes “a [[neural]] network associated with a given flow block is configured to provide one or more parameters of an affine processing, which is applied to the noise signal, or to a processed version of the noise signal, or to a portion of the noise signal, or to a portion of a processed version of the noise signal during the processing. Section 4.1 of Shi describes the use of affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 5, Shi describes “a [[neural]] network associated with the given flow block is configured to determine one or more parameters of the affine processing, in dependence on a first part of a flow block input signal and in dependence on the input audio signal, and wherein an affine processing associated with the given flow block is configured to apply the determined parameters to a second part of the flow block input signal, to acquire an affinely processed signal; and wherein the first part of the flow block input signal and the affinely processed signal form a flow block output signal of the given flow block.” Section 4.1 of Shi describes that a first part of the processing is a convolution layer and the second part of the processing is affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 6, Shi describes “the [[neural]] network associated with the given flow block comprises a depthwise separable convolution in the affine processing associated with the given flow block.”
Section 4.1 of Shi describes that a first part of the processing is an invertible convolution layer before the second part of the processing is affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 7, Shi describes “the apparatus is configured to apply an invertible convolution to the flow block output signal of the given flow block, to acquire a processed flow block output signal.” Section 4.1 describes the use of invertible convolution layers.
With respect to Claim 14, Shi describes “[[neural]] network parameters of the neural network for processing the noise signal, or the signal derived from the noise signal, are acquired using a processing of a training audio signal or a processed version thereof, in one or more training flow blocks in order to acquire a training result signal, wherein a processing of the training audio signal or of the processed version thereof using the one or more training flow blocks is adapted in dependence on a distorted version of the training audio signal and using the neural network, and wherein the neural network parameters of the neural networks are determined, such that a characteristic of the training result audio signal approximates or comprises a predetermined characteristic.”
Section 4.1 describes after equation 6 that the model is trained until the computed parameters converge. Section 2.1, equations 1-4 describe that the parameters stx are dependent on the input audio and noise signals.
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 15, Shi describes “the apparatus is configured to provide [[neural]] network parameters of the neural network for processing the noise signal, or the signal derived from the noise signal, wherein the apparatus is configured to process a training audio signal or a processed version thereof, using the one or more flow blocks in order to acquire a training result signal, and wherein the apparatus is configured to adapt a processing of the training audio signal or of the processed version thereof which is performed using the one or more flow blocks in dependence on a distorted version of the training audio signal and using the [[neural]] network, and wherein the apparatus is configured to determine [[neural]] network parameters of the [[neural]] networks, such that a characteristic of the training result audio signal approximates or comprises a predetermined characteristic.”
Section 4.1 describes after equation 6 that the model is trained until the computed parameters converge. It also describes that the model includes flow blocks. Section 2.1, equations 1-4 describe that the parameters stx are dependent on the input audio and noise signals.
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 16, Shi describes “the apparatus comprises an apparatus for providing [[neural]] network parameters, wherein the apparatus for providing neural network parameters is configured to provide [[neural]] network parameters of the neural network for processing the noise signal, or the signal derived from the noise signal, wherein the apparatus for providing [[neural]] network parameters is configured to process a training audio signal or a processed version thereof, using one or more training flow blocks in order to acquire a training result signal, and wherein the apparatus for providing [[neural]] network parameters is configured to adapt a processing of the training audio signal or the processed version thereof which is performed using the one or more flow blocks in dependence on a distorted version of the training audio signal and using the [[neural]] network; wherein the apparatus is configured to determine [[neural]] network parameters of the [[neural]] networks, such that a characteristic of the training result audio signal approximates or comprises a predetermined characteristic.”
Section 4.1 describes after equation 6 that the model is trained until the computed parameters converge. It also describes that the model includes flow blocks. Section 2.1, equations 1-4 describe that the parameters stx are dependent on the input audio and noise signals and can be considered “predetermined characteristics.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 17, Shi describes “the one or more flow blocks are configured to synthesize the processed audio signal on the basis of the noise signal under the guidance of the input audio signal.” Section 4,1 describes that the signal and noise are processed by a model including multiple flow blocks.
With respect to Claim 18, Shi describes “the one or more flow blocks are configured to synthesize the processed audio signal on the basis of the noise signal under the guidance of the input audio signal using the affine processing of sample values of the noise signal, or of a signal derived from the noise signal, wherein processing parameters of the affine processing are determined on the basis of sample values of the input audio signal using the [[neural]] network.” Section 4,1 describes that the signal and noise are processed by a model including multiple flow blocks. The model also includes affine processing of the signal and noise.
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 19, Shi describes “the apparatus is configured to perform a normalizing flow processing, in order to derive the processed audio signal from the noise signal.” Section 4 of Shi describes that the algorithm includes normalizing flow processing.
With regard to Claim 20, Shi describes:
“A method for providing a processed audio signal on the basis of an input audio signal, wherein the method comprises processing a noise signal, or a signal derived from the noise signal, using one or more flow blocks, in order to acquire the processed audio signal; (Section 4.2 of Shi describes that sound including noise is input.)
In theory, the flow model can be used to model the prior for both the clean speech and the noise signal, and the inference based on Eq.(4) is theoretically optimal if the prior models produce accurate likelihood for clean speech and noise. However, as we have discussed, the likelihood produced by deep generative models, including the WaveGlow model we used, is suspicious. To simplify the investigation, we use the flow model to represent the clean speech only, and constrain the noise to be Gaussian. Therefore, the ML-based inference can be conducted by optimizing the following objective with respect to xt:
log p(xt|yt) ∝ log p(xt) + log p(nt)
= log N (zt; 0, I) + log | det
∂f −1
∂xt
+ log N (yt − xt; 0, σI) (8)
where zt = f −1(xt) and f −1 is the flow model. σ is the variance of the noise. Since all the terms in the above objective can be computed easily, the optimization can be easily conducted by GD. Note that the Gaussian noise is just the white
wherein the method comprises adapting the processing performed using the one or more flow blocks in dependence on the input audio signal [[and using a neural network.]] (Section 4.1 of Shi describes that the processing is performed using flow blocks.)
PNG
media_image1.png
329
511
media_image1.png
Greyscale
Shi does not explicitly describe using a neural network or “wherein the apparatus is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.”
However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
Further, Section 4 of Zhai describes that the algorithm described can be run on computing devices.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the computerized neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With regard to Claim 21, Shi describes:
“An apparatus for providing [[neural]] network parameters for an audio processing, the apparatus including:
one or more flow blocks, (Section 4.2 shows a series of equations that can be structed as flow blocks.)
wherein the apparatus is configured to process a training audio signal, or a processed version thereof, using the one or more flow blocks in order to acquire a training result signal,
(Section 4.2 of Shi describes that sound including noise is input. Section 4.1 describes that the model is trained with noise and speech input.)
In theory, the flow model can be used to model the prior for both the clean speech and the noise signal, and the inference based on Eq.(4) is theoretically optimal if the prior models produce accurate likelihood for clean speech and noise. However, as we have discussed, the likelihood produced by deep generative models, including the WaveGlow model we used, is suspicious. To simplify the investigation, we use the flow model to represent the clean speech only, and constrain the noise to be Gaussian. Therefore, the ML-based inference can be conducted by optimizing the following objective with respect to xt:
log p(xt|yt) ∝ log p(xt) + log p(nt)
= log N (zt; 0, I) + log | det
∂f −1
∂xt
+ log N (yt − xt; 0, σI) (8)
where zt = f −1(xt) and f −1 is the flow model. σ is the variance of the noise. Since all the terms in the above objective can be computed easily, the optimization can be easily conducted by GD. Note that the Gaussian noise is just the white
wherein the apparatus is configured to adapt a processing performed using the one or more flow blocks in dependence on a distorted version of the training audio signal [[and using the neural network]]; (Section 4.1 of Shi describes that the processing is performed using flow blocks.)
PNG
media_image1.png
329
511
media_image1.png
Greyscale
wherein the apparatus is configured to determine [[neural]] network parameters of the [[neural]] networks, such that a characteristic of the training result audio signal approximates or comprises a predetermined characteristic.”
Section 2.1, equations 1-4 describe that the parameters stx are dependent on the input audio and noise signals and can be considered “predetermined characteristics.”
Shi does not explicitly describe using a neural network or “wherein the apparatus is implemented using a hardware apparatus, or using a computer, or using a combination of a hardware apparatus and a computer.”
However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
Further, Section 4 of Zhai describes that the algorithm described can be run on computing devices.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the computerized neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 22, Shi describes “the apparatus is configured to evaluate a cost function in dependence on characteristics of the acquired training result signal, and wherein the apparatus is configured to determine [[neural]] network parameters to reduce or minimize a cost defined by the cost function.” Equation 6 describes the cost function used.
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 23, Shi describes “the training audio signal and/or the distorted version of the training audio signal is represented by a set of time domain audio samples.” Section 4.1 of Shi describes that “The input is a window of 80 speech samples in the time domain.”
With respect to Claim 24, Shi describes “a [[neural]] network associated with a given flow block of the one or more flow blocks is configured to determine one or more processing parameters for the given flow block in dependence on the training audio signal, or a signal derived from the training audio signal, and in dependence on the distorted version of the training audio signal.”
Section 2.1, equations 1-4 describe that the parameters stx are dependent on the input audio and noise signals.
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 25, Shi describes “a [[neural]] network associated with a given flow block is configured to provide one or more parameters of an affine processing, which is applied to the training audio signal, or to a processed version of the training audio signal, or to a portion of the training audio signal, or to a portion of a processed version of the training audio signal during the processing.” Section 4.1 of Shi describes the use of affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 26, Shi describes “a [[neural]] network associated with the given flow block is configured to determine one or more parameters of the affine processing, in dependence on a first part of a flow block input signal or in dependence on a first part of a pre-processed flow block input signal and in dependence on the distorted version of the training audio signal, and wherein an affine processing associated with the given flow block is configured to apply the determined parameters to a second part of the flow block input signal or to a second part of the pre-processed flow block input signal, to acquire an affinely processed signal; and wherein the first part of the flow block input signal or of the pre-processed flow block input signal and the affinely processed signal form a flow block output signal xnew of the given flow block.”
Section 4.1 of Shi describes that a first part of the processing is a convolution layer and the second part of the processing is affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 27, Shi describes “the [[neural]] network associated with the given flow block comprises a depthwise separable convolution in the affine processing associated with the given flow block.” Section 4.1 of Shi describes that a first part of the processing is an invertible convolution layer before the second part of the processing is affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
Shi does not explicitly describe using a neural network. However, Zhai describes the use of a neural network for audio analysis in Section 1.
“In this paper, we propose SqueezeWave, a family of extremely lightweight flow-based vocoders for on-device
speech synthesis. Previous work (Iandola et al., 2016; Wu et al., 2016; 2017; 2018a;b; Yang et al., 2018; Gholami et al., 2018; Wu, 2019; Wu et al., 2019; Dai et al., 2019) have shown that optimizing the neural network architecture can lead to significant efficiency improvement in many applications.”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the neural network speech analyzer as described by Zhai into the method of Shi to provide significant efficiency improvement, as described in Section 1 of Zhai.
With respect to Claim 28, Shi describes “the apparatus is configured to apply an invertible convolution to the flow block input signal of the given flow block to acquire the pre-processed flow block input signal.” Section 4.1 describes the use of invertible convolution layers.
With respect to Claim 35, Shi describes “the one or more flow blocks are configured to convert the training audio signal into the training result signal.” Section 4.1 of Shi describes that the audio signal is processed by flow blocks to determine an output.
With respect to Claim 36, Shi describes “the one or more flow blocks are adjusted to convert the training audio signal into the training result signal under the guidance of the distorted version of the training audio signal, using the affine processing of sample values of the training audio signal, or of a signal derived from the training audio signal, wherein processing parameters of the affine processing are determined on the basis of sample values of the distorted version of the training audio signal using the neural network.” Section 4.1 of Shi describes that a first part of the processing is an invertible convolution layer before the second part of the processing is affine processing. “The entire flow involves 12 blocks, and each consists of two components: an invertible 1 X 1 convolution layer and an affine coupling layer.”
With respect to Claim 37, Shi describes “the apparatus is configured to perform a normalizing flow processing, in order to derive the training result signal from the training audio signal.” Section 4 of Shi describes that the algorithm includes normalizing flow processing.
With respect to Claim 38, method Claim 38 and apparatus Claim 21 are related as an apparatus programmed to perform the same method, with each claimed system function corresponding to each claimed method step. Accordingly, Claim 38 is similarly rejected under the same rationale as applied above with respect to Claim 21.
With respect to Claim 39, non-transitory storage media Claim 39 and apparatus Claim 1 are related as a non-transitory storage media programmed to perform the same
functions, with each claimed non-transitory storage media function corresponding to each claimed apparatus function. Accordingly, Claim 39 is similarly rejected under the same rationale as applied above with respect to Claim 1.
With respect to Claim 40, non-transitory storage media Claim 40 and apparatus Claim 21 are related as a non-transitory storage media programmed to perform the same functions, with each claimed non-transitory storage media function corresponding to each claimed apparatus function. Accordingly, Claim 40 is similarly rejected under the same rationale as applied above with respect to Claim 21.
6. Claims 8-13 and 29-34 are rejected under 35 U.S.C. 103 as unpatentable over Shin iv view of Zhai and further in view of “Dynamic Range Compression Deconvolution using A-law and μ-law Algorithms” (Haji-saeed et al., hereinafter “Haji”).
With respect to Claim 8, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a nonlinear compression to the input audio signal prior to processing the noise signal in dependence on the input audio signal.” The top of page 4 of Haji describes that a nonlinear compression can be applied to a signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 9, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a µ-law transformation as the nonlinear compression to the input audio signal.” Equation 3 of Haji is a µ-law transformation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 10, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a transformation according to gy=sgn(y)∙ln(1+µy)ln(1+µ); to the input audio signal, wherein sgn() is a sign function; µ is a parameter defining a level of compression.” Equation 3 of Haji renders obvious the claimed equation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 11, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a nonlinear expansion to the processed audio signal.” The Section 1 of Haji describes that a nonlinear expansion can be applied to a signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 12, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply an inverse µ-law transformation as the nonlinear expansion to the processed audio signal.” Equation 3 of Haji is a µ-law transformation for compression, a similar equation for expansion is shown to be mathematically equivalent by “A Novel Alternating µ-Law Companding Algorithm for PAPR Reduction in OFDM Systems” Tu et al.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 13, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a transformation according to g-1(x^) = sgn(x^)∙(((1+µ)x^-1)/µ); to the processed audio signal, wherein sgn() is a sign function; µ is a parameter defining a level of expansion.” Equation 3 of Haji is a µ-law transformation for compression, a similar equation for expansion is shown to be mathematically equivalent by “A Novel Alternating µ-Law Companding Algorithm for PAPR Reduction in OFDM Systems” Tu et al.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 29, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a nonlinear compression to the training audio signal prior to processing the noise signal in dependence on the training audio signal.” The top of page 4 of Haji describes that a nonlinear compression can be applied to a signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 30, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a µ-law transformation as the nonlinear compression to the training audio signal.” Equation 3 of Haji is a µ-law transformation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 31, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a transformation according to gy=sgn(y)∙ln(1+µy)ln(1+µ); to the training audio signal, wherein sgn() is a sign function; µ is a parameter defining a level of compression.” Equation 3 of Haji renders obvious the claimed equation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 32, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a nonlinear input compression to the distorted version of the training audio signal prior to processing the training audio signal in dependence on the distorted version of the training audio signal.” The Section 1 of Haji describes that a nonlinear compression can be applied to a signal.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 33, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a µ-law transformation as the nonlinear input compression to the distorted version of the training audio signal.” Equation 3 of Haji is a µ-law transformation for compression.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
With respect to Claim 34, Shi in view of Zhai does not explicitly describe this subject matter. However, Haji describes “the apparatus is configured to apply a transformation according to g-1(x^) = sgn(x^)∙(((1+µ)x^-1)/µ); to the distorted version of the training audio signal, wherein sgn() is a sign function; µ is a parameter defining a level of expansion.” Equation 3 of Haji renders obvious the claimed equation.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the µ-law transformation as described by Haji into the method of Shi in view of Zhai to recover a signal embedded in noise, as described in Section 1 of Haji.
Conclusion
7. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Pat. App. Pub. No. 20210256984 (Bayer et al.) also describes a device that processes an audio signal to remove noise.
8. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656