DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested: --Speech Enhancement Decoder Method, Apparatus, and Computer-readable Storage Medium Using an Extracted Enhancement Label Information Vector--.
Examiner Note on Patent Subject Matter Eligibility under 35 U.S.C. 101
Independent Claims 1, 19, and 20 regard a process/functionality that regards audio signal reconstruction from vectors and bitstream processing including extraction in a decoder. Thus, under the broadest reasonable interpretation (BRI) under step 2A prong 1, the claimed invention does not relate to a process that could practically be performed by a human, reasonably relate to any human activity, nor relates to a completely mathematical process. As such the independent claims and their related dependents by virtue of their dependency are found to be found to be eligible under step 2A prong 1.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 12-13 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
In Claim 12, lines 13-14, "the plurality of cascaded decoding layers" lacks antecedent basis and it is unclear what limitation is being referenced by this term. For claim interpretation in the interest of compact prosecution, "the plurality of cascaded decoding layers" will be construed as --a plurality of cascaded decoding layers--. Claims 13 and 18 have a similar indefiniteness issue and have been similarly rejected under 35 U.S.C. 112(b).
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1 and 4 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Omran, et al. ("Disentangling Speech from Surroundings in a Neural Audio Codec," March 2022).
With respect to Claim 1, Omran discloses:
An audio decoding method, executed by an electronic device, and comprising:
obtaining a bitstream of an audio signal (obtaining an encoded audio bitstream via an encoder, Sections 3-3.1, Page 2);
performing label extraction processing on a predicted value of a feature vector of the audio signal associated with the bitstream to obtain a label information vector (extracting data augmentation labels to produce an additive signal feature vector z(2)A, Section 3-3.2, Page 2; Fig. 1(A-C)), a dimension of the label information vector being the same as a dimension of the predicted value of the feature vector (Section 4.1, Page 3- "each audio frame of 320 samples into a 256-dimensional embedding vector, which is partitioned into two equal halves, one to carry information about the speech and the other about background noise");
performing signal reconstruction based on the predicted value of the feature vector and the label information vector (the speech feature and noise-related information vector are used to reconstruct an audio signal, Abstract; Section 3.1, Page 2 (“quantized embeddings are concatenated together along the feature axis and fed into the decoder, which converts them into a reconstructed audio waveform”); Sections 4.1-4.2, Pages 3-4; see also the audio output waveform produced in Figs. 1(A-C)); and
identifying a predicted value of the audio signal obtained by the signal reconstruction as a decoding result of the bitstream (neural network processing that identifies audio signal values by modifying the noise component for signal enhancement, Section 3.2, Page 2; Section 4.1, Page 3; example of Fig. 1B showing zeroing noise values; see also the specific waveforms having various amplitudes output in Fig. 1).
With respect to Claim 4, Omran further discloses:
The method according to claim 1, wherein the performing signal reconstruction based on the predicted value of the feature vector and the label information vector comprises:
splicing the predicted value of the feature vector and the label information vector to obtain a spliced vector; and compressing the spliced vector to obtain the predicted value of the audio signal (swapping/splicing in an augmentation embedding and then generating that new compressed embedding vectors prior to decoding to reconstruct a new version that inherits the attribute of interest from the spliced in embedding, Section 3.2, Page 2; Fig. 1C).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Omran, et al. in view of Jelinek (U.S. PG Publication: 2005/0261897 A1).
With respect to Claim 2, Omran teaches the audio signal enhancement process utilizing encoded noise vector components as applied to Claim 1. Although very well-known in the speech codec art, Omran does not teach a bitstream decoding operation followed by referencing a quantization table as set forth in claim 2. Jelinek, however, discloses:
decoding the bitstream to obtain an index value of the feature vector of the audio signal; and querying a quantization table based on the index value to obtain the predicted value of the feature vector of the audio signal (decoder extracts quantization indices from a digital bitstream and identifies a vector based upon a quantization table, Abstract; Paragraphs 0005, 0029-0030, and 0080).
Omran and Jelinek are analogous art because they are from a similar field of endeavor in audio coding. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the quantization table taught by Jelinek in the system of Orman to provide a predictable result of lower precision data that allows for more efficient data processing.
Claims 7-8 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Omran, et al. in view of Jelinek and further in view of Feng, et al. (U.S. PG Publication: 2023/0154474 A1).
With respect to Claim 7, Omran in view of Jelinek teaches the audio signal enhancement process utilizing encoded noise vector components as applied to Claim 2. Omran in view of Jelinek does not teach the upper and lower band signal decoding process set forth in claim 7. This process, however, is known in the speech coding art as is evidenced by the teachings of Feng. In particular, Feng discloses:
the bitstream comprises a low-frequency bitstream and a high-frequency bitstream, the low-frequency bitstream being obtained by coding a low-frequency sub-band signal obtained by decomposing the audio signal, and the high-frequency bitstream being obtained by coding a high-frequency sub-band signal obtained by decomposing the audio signal ("encoder 114 decomposes the standardized PCM data of each frame into two sub-bands of audio data" that are referred to as the "low sub-band" and the "higher sub-band" using a quadrature mirror filter, Paragraph 0030); and
the decoding the bitstream to obtain a predicted value of a feature vector of the audio signal comprises: decoding the low-frequency bitstream to obtain a predicted value of a feature vector of the low-frequency sub-band signal; and decoding the high-frequency bitstream to obtain a predicted value of a feature vector of the high-frequency sub-band signal (determination of feature vectors (i.e., a collection of features) predictions in the lower and higher sub-bands at a decoder, Paragraph 0041, 0046, and 0049; Fig. 6, Elements 614 and 622).
Omran, Jelinek, and Feng are analogous art because they are from a similar field of endeavor in audio coding. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the band-based coding taught by Feng in the audio signal codec taught by Omran in view of Jelinek to provide a predictable result of more accurate signal reconstruction across a frequency spectrum that avoids aliasing (Feng, Paragraph 0030).
The subject matter of Claim 8 is yielded by the combination resulting from the modification of Omran with the teachings of Feng. Specifically, as applied to claim 1, Omran teaches the extraction of noise-related data for speech enhancement of a same dimension at a decoder while the result of the modification relying on Feng is lower and upper band decoding where the labels/noise data would relate to those decomposed sub-bands. Accordingly, the subject matter of claim 8 is rendered obvious by the combination of the prior art teachings.
The subject matter of Claim 11 is yielded by the combination resulting from the modification of Omran with the teachings of Feng. Specifically, as applied to claim 4, Omran teaches the swapping/splicing in an augmentation embedding and then generating that new compressed embedding vectors prior to decoding to reconstruct a new version that inherits the attribute of interest from the spliced in embedding (Section 3.2, Page 2; Fig. 1C) while the result of the modification relying on Feng is lower and upper band decoding where the spliced/swapped augmentations would relate to those decomposed sub-bands. Furthermore, Feng teaches a low-band audio signal synthesizer/generator described at Paragraph 0047 (see also Fig. 6, Element 616) and a similar element for the high-band signal at Paragraph 0049 (see also Fig. 6, Element 624). Finally, Feng teaches the two sub-band audio signals are merged/synthesized together in an inverse QMF (Paragraph 0049 and Fig. 6, Element 632). Accordingly, the subject matter of claim 11 is rendered obvious by the combination of the prior art teachings.
Claims 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Omran, et al. in view of Jelinek and further in view of Gao. (U.S. PG Publication: 2021/0343303 A1).
With respect to Claim 14, Omran in view of Jelinek teaches the audio signal enhancement process utilizing encoded noise vector components as applied to Claim 2. Omran in view of Jelinek does not teach the N sub-band signal encoding/decoding process set forth in claim 14. Gao, however, discloses:
the bitstream comprises N sub-bitstreams, the N sub-bitstreams corresponding to different frequency bands and being obtained by coding N sub-band signals obtained by decomposing the audio signal, and N being an integer greater than 2 (QMF analysis filter bank that decomposes an input audio signal into four (i.e., N>2) subband signals for encoding a bitstream, Paragraphs 0067 and 0069; four outputs of Fig. 1, Element 106); and
the decoding the bitstream to obtain a predicted value of a feature vector of the audio signal comprises: decoding the N sub-bitstreams respectively to obtain predicted values of feature vectors corresponding to the N sub-band signals, respectively (see the 4 decoders corresponding to the various subbands shown in Fig 2 (Elements 204, 206, 208, and 210) wherein the decoded bit streams obtain respective predicted feature vector values (LTP, LPC, etc.), Paragraphs 0068-0070, and 0075).
Omran, Jelinek, and Gao are analogous art because they are from a similar field of endeavor in audio coding. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the band-based coding taught by Gao in the audio signal codec taught by Omran in view of Jelinek to provide a predictable result of more accurate signal reconstruction across a frequency spectrum leading to higher resolution audio output (Gao, Paragraph 0062).
The subject matter of Claim 15 is yielded by the combination resulting from the modification of Omran with the teachings of Gap. Specifically, as applied to claim 1, Omran teaches the extraction of noise-related data for speech enhancement of a same dimension at a decoder while the result of the modification relying on Gao is the 4-band decoding of the QMF-generated subbands where the labels/noise data would relate to those decomposed sub-bands. Accordingly, the subject matter of claim 8 is rendered obvious by the combination of the prior art teachings.
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Omran, et al. in view of Jelinek in view of Gao and further in view of Fuchs, et al. (U.S. PG Publication: 2022/0223161 A1).
With respect to Claim 16, Omran in view of Jelinek in view of Gao teaches the 4 sub-band audio signal enhancement process utilizing augmentation components as applied to Claim 15. Omran in view of Jelinek in view of Gao do not teach the enhancement network processing set forth in claim 16. Fuchs, however, disclsoes:
invoking, based on a predicted value of a feature vector of an ith sub-band signal, an ith enhancement network for label extraction processing to obtain an ith label information vector (an enhancement neural network/machine learning model is invoked to obtain a scaling parameter (e.g., a 1-D vector or scalar) for each frequency bin/range (i.e., subband), Paragraphs 0015-0016, 0124, 0137, and 0182),
a value range of i satisfying that i is greater than or equal to 1 and is smaller than or equal to N, and a dimension of the ith label information vector being the same as a dimension of the predicted value of the feature vector of the ith sub-band signal (i is determined for each bin/range (i.e,. i=N) in the enhancement neural network; wherein dimensionality equivalence is taught by Omran as applied to Claim 1, Paragraphs 0015-0016, 0124, 0137, and 0182).
Omran, Jelinek, Gao, and Fuchs are analogous art because they are from a similar field of endeavor in audio coding. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the enhancement network processing taught by Fuchs in the enhancement approach taught by Omran in view of Jelinek in view of Gao to provide a predictable result of further efficiently adapting enhancement based upon decoded audio values for higher audio quality (Fuchs, Paragraph 0030).
Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Omran, et al. in view of Fuchs, et al.
Claims 19 and 20 respectively relate to alternative embodiments for carrying out the method of claim 1, and thus, are rejected under similar rationale. Since Orman is a scholarly publication, Omran leaves out the structural/product of manufacture details in claims 19 and 20, respectively the memory storing processor-executable instructions and a processor along with a non-transitory computer-readable storage medium storing program instructions. Fuchs, however, teaches a decoder utilizing speech enhancement (Paragraph 0124) having such computer-based embodiments (Paragraphs 0250-0252).
Omran and Fuchs are analogous art because they are from a similar field of endeavor in audio coding. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date to utilize the computer components taught by Fuchs to implement the speech enhancement of Omran to provide a predictable result of allowing for speech enhancement processing on a general-purpose computing device.
Allowable/Potentially Allowable Subject Matter
Claims 3, 5-6, 9-10, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Claims 12-13 and 18 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and amended to overcome the preceding rejection under 35 U.S.C. 112(b).
The following is a statement of reasons for the indication of allowable/potentially allowable subject matter:
With respect to Claim 3, the prior art of record fails to explicitly teach or fairly suggest either taken individually or in combination, the method for audio signal decoding set forth in claim 1 that further includes the tensor-based processing set forth in claim 3 to obtain a label information vector.
Most Pertinent Prior Art:
Although Omran, et al. ("Disentangling Speech from Surroundings in a Neural Audio Codec," March 2022) discloses obtaining labeled noise information of the same dimensionality as speech feature vectors (Section 3-3.2, Page 2; Fig. 1(A-C)), Omran’s determination is a result of the decoding of the encoded information and not by the specific approach set forth in claim 3. While additional prior art such as Qi, et al. ("Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement," 2020) teaches tensor to vector conversion using a neural network having convolutional and fully connected layers, Section 2.1-2.2, Pages 2-3), Qi does not teach that the information converted are the labels that maintain a dimensionality through the conversion process of claim 3. Instead, Qi takes a noisy speech tensor and proceeds through convolutional and fully connected layers to yield an enhanced speech vector not the label vector for enhancement used for reconstruction. Accordingly, the prior art of record fails to explicitly teach or fairly suggest the invention set forth in claim 3. Claims 9-10 and 17 contain subject matter similar to Claim 3, and thus, contains allowable subject matter over the prior art of record under similar rationale.
With respect to Claim 5, the prior art of record fails to explicitly teach or fairly suggest either taken individually or in combination, the method for audio signal decoding set forth in claim 1 that further includes the processing sequence performed on the spliced feature vector and label information to yield the predicted value of the audio signal as set forth in claim 5.
Most Pertinent Prior Art:
Although Omran, et al. ("Disentangling Speech from Surroundings in a Neural Audio Codec," March 2022) discloses splicing in an augmentation embedding and then generating the new compressed embedding vectors prior to decoding to reconstruct a new version that inherits the attribute of interest from the spliced in embedding (Section 3.2, Page 2; Fig. 1C (Section 3-3.2, Page 2; Fig. 1(A-C)), Omran’s does not involve by the specific neural network layer processing approach set forth in claim 5. While additional prior art such as Park, et al. ("A Fully Convolutional Neural Network for Speech Enhancement," 2016) discloses a combination of 2 convolutional layers with an in-between upsampling layer in a decoder (Section 3.1, Page 2; Fig. 2), Park lacks the input of a particular splicing operation and the claimed pooling operation. Thus, the network structure and its processing described in claim 5 differs from that of Park. Accordingly, the prior art of record fails to explicitly teach or fairly suggest the invention set forth in claim 5.
Dependent Claim 6 further limits a claim containing allowable subject matter, and thus, also contains allowable subject matter by virtue of its dependency. Claims 12-13 and 18 contain subject matter similar to Claim 5, and thus, contain potentially allowable subject matter over the prior art of record under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Biswas, et al. (U.S. PG Publication: 2021/0327445 A1)- teaches a core encoded bitstream along with enhancement for controlling a type and amount of audio enhancement at encoding (Paragraphs 0136 and 0139).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655
/JAMES S WOZNIAK/Primary Examiner, Art Unit 2655