Ishe Introduction
1. This office action is in response to Applicant’s submission filed on 10/15/2025. Claims 1-20 are pending in the application and have been examined.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
3. The amendment filed 10/15/2025 has been entered and fully considered. With respect to the rejections under 35 USC 103, the arguments presented are rendered moot by the new ground of rejection based on U.S. Pat. App. Pub. No. 20220310056 (Ramabhadran et al., hereinafter “Ramabhadran”), listed below.
Claim Rejections - 35 USC § 103
4. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. Claims 1, 3, 4, 11, 13, 14, and 20 are rejected under 35 U.S.C. 103 as unpatentable over U.S. Pat. App. Pub. No. 20230017503 (Moritz et al., hereinafter “Moritz”) in view of U.S. Pat. App. Pub. No. 20220310056 (Ramabhadran et al., hereinafter “Ramabhadran”).
With regard to Claim 1, Moritz describes:
“A method of speech-to-speech conversion, comprising:
converting received audio data in a first speech pattern to acoustic characteristics of an utterance in a first language, the audio data comprising a sequence of acoustic frames; (Paragraph 39 describes that audio input frames are received from a user. Paragraph 43 describes that the frames are broken into a series of audio features.)
generating, via an encoder, an encoded sequence including first acoustic features representing first speech in the first speech pattern based on the acoustic characteristics, the encoder using a combination of look-ahead stacking of the acoustic frames and look-ahead self-attention of the acoustic frames; (Paragraph 19 describes that the model may be an encoder/decoder. Paragraph 46 describes that look ahead is used with self-attention modules to generate the output. Paragraph 6 describes that stacking may be used.)
Moritz does not explicitly describe:
“generating, via a streaming spectrogram decoder configured to receive the encoded sequence generated via the encoder, second acoustic features representing second speech in a second speech pattern based on the encoded sequence, the second speech pattern different than the first speech pattern;
generating, via a vocoder, a waveform of the second speech in the second speech pattern based on the second acoustic features; and
outputting the waveform of the second speech in the second speech pattern.”
However, Ramabhadran describes:
“generating, via a streaming spectrogram decoder configured to receive the encoded sequence generated via the encoder, second acoustic features representing second speech in a second speech pattern based on the encoded sequence, the second speech pattern different than the first speech pattern; (Figure 1 and Paragraph 31 describe that spectrogram decoder 220 receives an encoded sequence from encoder 210 and produces spectrogram 222 (cited as “second acoustic features.))
generating, via a vocoder, a waveform of the second speech in the second speech pattern based on the second acoustic features; and (Figure 1 and Paragraph 28 describe that vocoder 375 receives spectrogram 222 from the decoder and generates audio waveform 376)
outputting the waveform of the second speech in the second speech pattern.” (Figure 1 shows that audio waveform 376 is the output of the device.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the decoder and vocoder as described by Ramabhadran into the invention of Moritz to allow for training the vocoder to improve performance, as described in paragraph 31 of Ramabhadran.
With respect to Claim 3, Moritz renders obvious “the generating the encoded sequence includes a look-ahead stacker which stacks a current acoustic frame and at least four acoustic frames in the future relative to a current frame being analyzed.”
Paragraph 6 describes that 2 look-ahead frames may be stacked as an example. Thus, Moritz identifies the number of look-ahead frames as a result effective variable, and the number of frames chosen is a design choice, based in part on the resulting delay. In In re Antonie, 559 F.2d 618, 195 USPQ 6 (CCPA 1977), the CCPA held that a particular parameter must first be recognized as a result-effective variable, i.e., a variable which achieves a recognized result, before the determination of the optimum or workable ranges of said variable might be characterized as routine experimentation. Se MPEP 2144.05 (II)(B). In the present case, it would be routine experimentation to arrive at the claimed “at least four acoustic frames” based on the discussion of 2 frames in paragraph 6 of Moritz along with the discussion that using 2 frames is only an example and other numbers can be chosen based on design requirements.
With respect to Claim 4, Moritz renders obvious “the generating the encoded sequence includes subsampling the acoustic frames by 2x.”
Paragraph 77 describes that the device may sub-sample at a 4x lower frame rate, as one example. Thus, Moritz identifies the sub-sampling rate as a result effective variable, and the rate chosen is a design choice, based in part on the design requirements. In In re Antonie, 559 F.2d 618, 195 USPQ 6 (CCPA 1977), the CCPA held that a particular parameter must first be recognized as a result-effective variable, i.e., a variable which achieves a recognized result, before the determination of the optimum or workable ranges of said variable might be characterized as routine experimentation. Se MPEP 2144.05 (II)(B). In the present case, it would be routine experimentation to arrive at the claimed “subsampling the acoustic frames by 2x” based on the discussion of 4x sub-sampling in paragraph 77 of Moritz along with the discussion that this is only an example and other numbers can be chosen based on design requirements.
With respect to Claims 11, 13, and 14, storage medium Claim 11 and method Claim 1 are related as a storage medium programmed to perform the same method, with each claimed storage medium function corresponding to each claimed method step. Accordingly, Claims 11, 13, and 14 are similarly rejected under the same rationale as applied above with respect to Claims 1, 3, and 4.
With respect to Claim 20, apparatus Claim 20 and method Claim 1 are related as an apparatus programmed to perform the same method, with each claimed apparatus function corresponding to each claimed method step. Accordingly, Claim 20 is similarly rejected under the same rationale as applied above with respect to Claim 1.
6. Claims 2, 5, 12, and 15 are rejected under 35 U.S.C. 103 as unpatentable over Moritz in view of Ramabhadran and further in view of U.S. Pat. App. Pub. No. 20230090590 (Fu et al., hereinafter “Fu”).
With respect to the subject matter of Claim 2, Moritz describes “the encoded sequence includes [[a conformer layer with]] self-attention looking at least 65 acoustic frames in the past relative to a current acoustic frame being analyzed.” Paragraph 12 describes that the self-attention module uses as many past frames as possible.
Moritz in view of Ramabhadran does not explicitly describe the use of “a conformer layer. However, paragraph 33 of Fu describes that an attention model may include a conformer layer.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the conformer layer as described by Fu into the invention of Moritz in view of Ramabhadran to integrate a time-masked attention model and a causal convolution model, as described in paragraph 33 of Fu.
With regard to Claim 5, this claim is rejected based on the rejections of Claims 2 and 3.
With respect to Claims 12 and 15, system Claim 11 and method Claim 1 are related as a system programmed to perform the same method, with each claimed system function corresponding to each claimed method step. Accordingly, Claims 12 and 15 are similarly rejected under the same rationale as applied above with respect to Claims 2 and 5.
7. Claims 6, 7, 16, and 17 are rejected under 35 U.S.C. 103 as unpatentable over Moritz in view of Ramabhadran and further in view of U.S. Pat. App. Pub. No. 20200177470 (Kuo et al., hereinafter “Kuo”).
With respect to the subject matter of Claim 6, Moritz in view of Ramabhadran does not explicitly describe this subject matter.
However, Kuo describes “the encoder is an int8 stream encoder and the decoder is an int8 stream decoder.” Paragraphs 132-134 indicate that an int8 encoder and decoder is used to encode and decode data.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the int8 encoder/decoder as described by Kuo into the invention of Moritz in view of Ramabhadran to enable the neural network to provide enhanced data, as described in paragraph 101 of Kuo.
With respect to Claim 7, Moritz renders obvious “a perceived delay between receiving the received audio data and outputting the waveform of the second speech in the second speech pattern is less than 350 ms.”
Paragraph 2 describes that delays need to be minimized for streaming applications. Paragraph 52 describes ways to minimize processing delays, and indicates that the lowest possible delay is best. Thus, Moritz identifies the delay time as a result effective variable. In In re Antonie, 559 F.2d 618, 195 USPQ 6 (CCPA 1977), the CCPA held that a particular parameter must first be recognized as a result-effective variable, i.e., a variable which achieves a recognized result, before the determination of the optimum or workable ranges of said variable might be characterized as routine experimentation. Se MPEP 2144.05 (II)(B). In the present case, it would be routine experimentation to arrive at the claimed “less than 350 ms” based on the discussion of the need to reduce delay and the methods providing for minimizing delay.
With respect to Claims 16 and 17, system Claim 11 and method Claim 1 are related as a system programmed to perform the same method, with each claimed system function corresponding to each claimed method step. Accordingly, Claims 16 and 17 are similarly rejected under the same rationale as applied above with respect to Claims 6 and 7.
8. Claims 8, 9, and 18 are rejected under 35 U.S.C. 103 as unpatentable over Moritz in view of Ramabhadran and Kuo and further in view of U.S. Pat. App. Pub. No. 20170236518 (Lane et al., hereinafter “Lane”).
With respect to the subject matter of Claims 8 and 9, Moritz in view of Ramabhadran and Kuo does not explicitly describe this subject matter.
With respect to Claim 8, Lane describes “a size of the encoder and decoder quantization model is less than 200 MB.” Table 1 shows that the LM binary model may be less than 200 MB.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the model size as described by Lane into the invention of Moritz in view of Ramabhadran to provide optimal performance as described in paragraph 28 of Lane.
With respect to Claim 9, Lane renders obvious “a real time factor of the encoder is 2.5x faster.”
Paragraph 28 describes two possible real time factors, 0.02 and 0.07. Paragraph 28 further describes that the real time factor affects performance, and different real time factors can be used in different situations/evaluations. Thus, Lane identifies the real time factor as a result effective variable. In In re Antonie, 559 F.2d 618, 195 USPQ 6 (CCPA 1977), the CCPA held that a particular parameter must first be recognized as a result-effective variable, i.e., a variable which achieves a recognized result, before the determination of the optimum or workable ranges of said variable might be characterized as routine experimentation. Se MPEP 2144.05 (II)(B). In the present case, it would be routine experimentation to arrive at the claimed “real time factor of the encoder is 2.5x faster” based on the discussion of using an appropriate real time delay for the application.
With respect to Claim 20, system Claim 11 and method Claim 1 are related as a system programmed to perform the same method, with each claimed system function corresponding to each claimed method step. Accordingly, Claim 20 is similarly rejected under the same rationale as applied above with respect to Claim 10.
9. Claims 10 and 19 are rejected under 35 U.S.C. 103 as unpatentable over Moritz in view of Ramabhadran and further in view of U.S. Pat. App. Pub. No. 20170236518 (Lane et al., hereinafter “Lane”).
With respect to the subject matter of Claim 10, Moritz in view of Ramabhadran does not explicitly describe this subject matter.
However, Lane describes “a translated word error rate of the resulting waveform of the second speech in the second speech pattern is less than 16%.” Paragraph 28 describes word error rates of 5.1% and 5.33%, which are less than 16%.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to include the word error rate as described by Lane into the invention of Moritz in view of Ramabhadran to provide optimal performance as described in paragraph 28 of Lane.
With respect to Claim 19, system Claim 11 and method Claim 1 are related as a system programmed to perform the same method, with each claimed system function corresponding to each claimed method step. Accordingly, Claim 19 is similarly rejected under the same rationale as applied above with respect to Claim 10.
Conclusion
10. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Pat. App. Pub. No. 20230395061 (Biadsy et al.) also describes a device that includes a streaming spectrogram decoder.
11. Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWARD TRACY whose telephone number is (571)272-8332. The examiner can normally be reached Monday-Friday 9 AM- 5PM.
12. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/EDWARD TRACY JR./Examiner, Art Unit 2656
/BHAVESH M MEHTA/Supervisory Patent Examiner, Art Unit 2656