DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, or 365(c) is acknowledged.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/26/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory obviousness-type double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the conflicting application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement.
Effective January 1, 1994, a registered attorney or agent of record may sign a terminal disclaimer. A terminal disclaimer signed by the assignee must fully comply with 37 CFR 3.73(b).
Claims 21, 28 and 40 are rejected on the ground of nonstatutory obviousness-type double patenting as being unpatentable over claim 5 of U.S. Patent No. 11,735,197 (a grandparent patent) as well as claim 4 of US Pat. 12,080,311 (a parent patent). Although the conflicting claims are not identical, they are not patentably distinct from each other because the instant independent claims are broader than a claim of its parent patent / a grandparent patent by omitting a few limitations. In other words, one or more claims in the parent patent / the grandparent patent anticipate the instant claims. Anticipation is “the ultimate or epitome of obviousness” (In re Kalm, 154 USPQ 10 (CCPA 1967), also In re Dailey, 178 USPQ 293 (CCPA 1973) and In re Pearson, 181 USPQ 641 (CCPA 1974)).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 21-23, 25-29 and 31-40 are rejected under 35 U.S.C. §103 as being unpatentable over Engel et al. (“Self-supervised Pitch Detection by Inverse Audio Synthesis”, published on July 02, 2020, referred to as Engel) in view of Kim et al. (US PG Pub. US PG Pub. 2020/0082806, referred to as Kim).
Engel is a published research paper co-authored by the inventors. Engle has two other authors who are not the inventors. The Engel qualifies as a prior art reference as “by another” under AIA (See MPEP 2153.01(a)). Engel discloses a differentiable digital signal processing (DDSP) by combining a neural network with differentiable synthesizers (Engel, Abstract, Introduction). In particular, Engel discloses using a neural network (claimed “a machine-learned model”) to extract pitch / other speech information (claimed “control inputs”) and provides the extracted information to a DDSP synthesizer (Engel, Section 3, Model structure).
Kim discloses a neural network implemented speech synthesis system (Kim, [0041-0043], [0054], Fig. 4). In particular, Kim discloses training the neural network speech synthesizer by using a backpropagation algorithm (Kim, [0093]) using text and recorded speech (Kim, [0049], [0061]). Kim further discloses generating synthesized speech by specifying desired prosody features such as pitch, duration or accent (Kim, [0063], [0108]).
Regarding claim 21, Engel discloses a system comprising:
providing, as an input into a machine-learned audio processing system, audio data describing a voice recording (Engel, section 4, Fig. 1), wherein the machine-learned audio processing system comprises:
one or more differentiable digital signal processors (“DDSPs”) configured to receive one or more control inputs and to process the one or more control inputs to generate a digital signal output (Engle, Abstract, Introduction, section 3, a model structure of DDSP), wherein each respective DDSP of the one or more DDSPs is differentiable from a respective digital signal output of the respective DDSP to a respective control input of the respective DDSP (Engel, section 3, describes details of DDSP based speech synthesis using control input such as pitch, fundamental frequency, timber etc., which are obtained from a trained neural network); and
a machine-learned model configured to receive a model input based on the audio data and to process the model input to generate the one or more control inputs for the one or more DDSPs, wherein the machine-learned model has been trained by backpropagating a loss through the one or more DDSPs (Angel, section 3.3, Section 4.2, training a combined neural network with differential speech synthesizer); and
receiving, as an output of the machine-learned audio processing system generated based on the digital signal output, data describing a transformed voice recording (Angel, section 4, experiment results, synthesized speech output from DDSP models).
Angel is a research paper which is concise in nature. Many claimed features (e.g., a processor, a memory) in claim 1 are implied. Angle does not explicitly disclose training a neural network using “by backpropagating a loss through …”. To explicitly show these common computer elements (processor, memory), voice recording, using backpropagation algorithm, the examiner further cites Kim.
Kim discloses a computer implemented neural network speech synthesis system (Kim, [0037]). Kim further discloses training a neural network using backpropagation algorithm (Kim, [0093], [0120]). Kim further discloses recording voice using microphone (Kim, [0049], [0051]) and generates a transformed voice (Kim, [0044], Korean words pronounced by an American speaking English).
Both Engel and Kim are related to using a trained neural network for generating synthesized speech. It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to combine Engel’s teaching with Kim’s teaching to obtain more details related to a neural network implemented speech synthesizer with desired prosody information such as pitch or duration. One having ordinary skill in the art would have been motivated to make such a modification to generate natural speech sounds (Kim, [0004], [0074]). In addition, all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods, and in the combination each element merely would have performed the same function as it did separately. “A combination of familiar elements according to known methods is likely to be obvious when it does no more than yield predictable results.” KSR, 550 U.S. ___, 82 USPQ2d at 1395 (2007). One of ordinary skill in the art would have recognized that the results of the combination were predictable.
Independent claim 28 (directed to a system) and claim 40 (directed to a device) include features similar to claim 21. Claims 28 and 40 are rejected based on the same rationale as explained for claim 21.
Regarding claim 22, Engel in view of Kim further discloses recording the voice recording using the microphone (Kim, [0048]).
Regarding claims 23 and 29, Engel in view of Kim further discloses the computing system is a first computing system, and wherein the machine-learned audio processing system is operated on a second computing system as a portion of a web service (Kim, [0111], [0123], [0129]).
Regarding claims 25 and 31, Engel in view of Kim further discloses the voice recording is associated with a video recording (Kim, [0127]).
Regarding claim 26, Engel in view of Kim further discloses uses the machine-learned model to process voice recordings (Engel, Section 3.1, using neural network to process voice recordings, see Fig. 1).
Regarding claim 27, Engel in view of Kim further discloses the communication application is operable to communicate text messages (Kim, [0012], [0058-0060], Fig. 2).
Regarding claim 32, Engel in view of Kim further discloses the one or more differentiable digital signal processors comprises ONE OR more of a linear time-varying filter, a linear time-invariant filter, a finite impulse response filter, an infinite impulse response filter, an oscillator (Engel, section 3.1, sinusoidal synthesizer), a short-time Fourier transform, a parametric equalization processor, an effects processor, an additive synthesizer (Engle, section 2), a subtractive synthesizer (Engle, section 2), and a wavetable synthesizer. (Examiner note, the cited reference only needs to teach ONE alternative recited using ONE OR MORE).
Regarding claim 33, Engel in view of Kim further discloses the one or more DDSPs comprises an additive synthesizer and a subtractive synthesizer, and wherein the operations comprise:
generating, based on the audio data, the digital signal output using the additive synthesizer and the subtractive synthesizer (Engle, section 2).
Regarding claim 34, Engel in view of Kim further discloses the additive synthesizer comprises an oscillator and the subtractive synthesizer comprises a linear time-varying filter applied to a noise source (Engel, Section 2 and section 3).
Regarding claim 35, Engel in view of Kim further discloses the machine-learned model comprises an encoder for processing reference audio data and a decoder for outputting the one or more control inputs (Engel, Section 3.2-3.3, encoding / decoding).
Regarding claim 36, Engel in view of Kim further discloses the loss comprises a spectral loss based at least in part on outputs of the one or more DDSPs (Engel, section 3)
Regarding claim 37, Engel in view of Kim further discloses the spectral loss is a multi-scale spectral loss determined between the outputs of the one or more DDSPs and reference audio data (Engel, Section 3.3, Loss functions, eq. 6, which is the same equation as in the specification, referred as multi-scale loss)
Regarding claim 38, Engel in view of Kim further discloses generating, based on the audio data, the digital signal output using the additive synthesizer and the subtractive synthesizer (Engel, section 2).
Regarding claim 39, Engel in view of Kim further discloses the additive synthesizer comprises an oscillator and the subtractive synthesizer comprises a linear time-varying filter applied to a noise source (Engel, section 3.1, a linear time-varying filtered noise source. Noise is generated from a uniform distribution).
Claims 24 and 30 are rejected under 35 U.S.C. §103 as being unpatentable over Engel in view of Kim and further in view of Conkie et al. (US PG Pub. 2013/0144624, referred to as Conkie).
Regarding claims 24 and 30, Engel in view of Kim implicitly discloses communications between two computers (Kim, [0123], [0129]). To further shows communications between claimed “a computer system” and “a second computing system”, the examiner further cites Conkie which discloses a client / server speech synthesizing system (Conkie, Fig. 2, Fig. 3). Conkie further discloses a client computer has a web interface and accepting text (Conkie, Fig. 3, #1). The client sends the received text to a server for generating synthesized speech (Conkie, Fig. 3, #5). Conkie further shows various communications between the client computer and the server computer (Conkie, [0023], [0026], Fig. 4).
Engel in view of Kim and Conkie are related to generating synthesized speech. It would have been obvious to a person having ordinary skill in the art at the time the invention was filed to combine Engel in view of Kim’s teaching with Conkie’s teaching by using a client/server speech synthesizer structure. One having ordinary skill in the art would have been motivated to make such a modification to provide good latency and good prosody (Conkie, [0004]). In addition, all the claimed elements were known in the prior art and one skilled in the art could have combined the elements as claimed by known methods, and in the combination each element merely would have performed the same function as it did separately. “A combination of familiar elements according to known methods is likely to be obvious when it does no more than yield predictable results.” KSR, 550 U.S. ___, 82 USPQ2d at 1395 (2007). One of ordinary skill in the art would have recognized that the results of the combination were predictable.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The examiner discovered several relevant prior art references that are related to one or more concepts disclosed by the instant application. These references are included in the attached PTO-892 form for completeness of the record.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jialong He, whose telephone number is (571) 270-5359. The examiner can normally be reached on Monday – Friday, 8:00AM – 4:30PM, EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Pierre Desir can be reached on (571) 272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JIALONG HE/Primary Examiner, Art Unit 2659