DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-8 of U.S. Patent No. 12,118,979. Although the claims at issue are not identical, they are not patentably distinct from each other because of the following:
Pending U.S. Application No. 18/807,799
U.S. Patent No. 12,118,979
Claims 1, 9 and 17:
A computing system including one or more processors and one or more memories configured to perform operations comprising: comparing synthetic speech data for an input text to recorded reference speech data corresponding to the input text; extracting a plurality of features with each indicative of at least one difference between the synthetic speech data and the recorded reference speech data based on, at least in part, comparing the synthetic speech data to the recorded reference speech data; generating a speech gap filling model based on, at least in part, the plurality of features extracted; predicting a best sequence of the plurality of features extracted to be added to an interim set of parameters for synthesis of a speech output; generating the speech output based on, at least in part, the speech gap filling model and the best sequence of the plurality of features added to the interim set of parameters.
Claim 1:
A computing system including one or more processors and one or more memories configured to perform operations comprising: comparing synthetic speech data for an input text to recorded reference speech data corresponding to the input text; extracting at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data; generating a speech gap filling model based on, at least in part, the at least one feature extracted; generating a speech output based on, at least in part, the speech gap filling model; comparing the speech output generated for a second input text to recorded reference speech data corresponding to the second input text; and extracting an updated at least one feature indicative of at least one difference between the speech output generated for the second input text and the recorded reference speech data corresponding to the second input text based on, at least in part, the comparison of the speech output for the second input text to the recorded reference speech data corresponding to the second input text.
Claims 2, 10 and 18 correspond to
Claim 2
Claims 3, 11 and 19 correspond to
Claim 3
Claims 4, 12 and 20 correspond to
Claim 4
Claims 5 and 13 correspond to
Claim 5
Claims 6 and 14 correspond to
Claim 6
Claims 7 and 15 correspond to
Claim 7
Claims 8 and 16 correspond to
Claim 8
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101.
Claims 1, 9 and 17 are directed to the abstract idea of comparing data, extracting features from that comparison, and using those features to generate a predictive model — a concept that falls within the category of mathematical concepts and mental processes. Specifically, the claims recite comparing synthetic speech data to reference speech data, extracting features indicative of differences, and predicting a best sequence of features — all of which are fundamentally mathematical operations involving data analysis, pattern recognition, and optimization. Collecting and analyzing data to identify patterns or differences, and using those patterns to generate a model or prediction, constitutes an abstract idea.
The claims fail to recite limitations that amount to “significantly more” than the abstract idea itself. The recited hardware — “one or more processors and one or more memories” — is generic computing infrastructure that performs no specialized or unconventional function. The claim does not describe any particular technical improvement to the computer itself, nor does it recite any specific, non-conventional algorithm that transforms the abstract idea into a patent-eligible application. The “speech gap filling model” and “best sequence” prediction, while functionally described, are claimed at a high level of generality without disclosing any inventive technical implementation that goes beyond what a generic computer routinely performs.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claims are (i) mere instructions to implement the idea on a computer, and/or (ii) recitation of generic computer structure that serves to perform generic computer functions that are well-understood, routine, and conventional activities previously known to the pertinent industry. Viewed as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself. Therefore, the claim(s) are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter. There is further no improvement to the computing device.
Dependent claims 2-8, 10-16 and 18-20 further recite an abstract idea performable by a human and do not amount to significantly more than the abstract idea as they do not provide steps other than what is conventionally known in speech synthesis.
Claims 2, 10 and 17, nothing more than basic data input, data processing, and data output performed on a generic computer, and adds no meaningful limitation beyond the abstract idea of Claim 1.
Claims 3, 11 and 18, simply references additional abstract mathematical models as inputs, which does not transform the underlying abstract idea into patent-eligible subject matter.
Claims 4, 12 and 19, a purely mathematical operation, manipulating a numerical value, that adds no inventive concept beyond the abstract idea already recited in Claim 1.
Claims 5, 13 and 20, does not add anything meaningful beyond organizing data prior to performing the abstract idea of Claim 1.
Claims 6 and 14, well-known mathematical and signal processing techniques that represent routine, conventional operations, and applying them.
Claims 7 and 15, simply incorporating a neural network into an otherwise abstract process does not make the claim patent-eligible.
Claims 8 and 16, adds no inventive concept sufficient to transform the abstract idea of Claim 1 into patent-eligible subject matter.
Claims 17-20 are drawn to a "program" perse as recited in the preamble ("computer readable medium" can be communication media as defined in the disclosure) and as such is non-statutory subject matter. See MPEP § 2106.1V.B.1 .a. Data structures not claimed as embodied in computer readable media are descriptive material per se and are not statutory because they are not capable of causing functional change in the computer. See, e.g., Warmerdam, 33 F.3d at 1361,31 USPQ2d at 1760 (claim to a data structure per se held nonstatutory). Such claimed data structures do not define any structural and functional interrelationships between the data structure and other claimed aspects of the invention, which permit the data structure's functionality to be realized. In contrast, a claimed computer readable medium encoded with a data structure defines structural and functional interrelationships between the data structure and the computer software and hardware components which permit the data structure's functionality to be realized, and is thus statutory. Similarly, computer programs claimed as computer listings per se, i.e., the descriptions or expressions of the programs are not physical "things." They are neither computer components nonstatutory processes, as they are not "acts" being performed. Such claimed computer programs do not define any structural and functional interrelationships between the computer program and other claimed elements of a computer, which permit the computer program's functionality to be realized.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-7, 9-15, 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (“A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis”; IEEE 2015) in view of Takamichi et al. (“Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis”; IEEE 2016).
Claims 1, 9 and 17,
Chen teaches a computing system including one or more processors and one or more memories configured to perform operations comprising: comparing synthetic speech data for an input text to recorded reference speech data corresponding to the input text ([IV. Evaluation] [A. Voices and Methods] [pg. 2008] the DNNs were trained with paired synthetic and natural spectral features aligned using the dynamic time warping (DTW) algorithm; the female voice was built using 4546 sentences recorded from a Scottish female speaker);
extracting a plurality of features with each indicative of at least one difference between the synthetic speech data and the recorded reference speech data based on, at least in part, comparing the synthetic speech data to the recorded reference speech data ([B. Model Training] [2) Binary encoding of spectral features] [pgs. 2006-2007] [pg. 2011] the hidden representations for the DBNs are extracted layer-by-layer as the binary code of the DBN auto-encoders; the acoustic differences between synthetic and natural speech are modeled in a high-level binary hidden space (Chen teaches extraction of multiple hidden features/representations that capture differences between synthetic and natural/reference speech));
generating a speech gap filling model based on, at least in part, the plurality of features extracted ([Abstract] [C. Spectral Postfiltering] [pg. 2007the network models the conditional probability of the spectrum of natural speech given that of synthetic speech to compensate for such gap between synthetic and natural speech; the proposed DNN directly describes a conditional distribution of natural spectral feature y given synthetic spectral feature x (DNN/BAM/DBN postfilter is the claimed gap-filling model));
predicting a best sequence of the plurality of features extracted ([C. Spectral Postfiltering] [2007-2008] The optimal binary samples are sampled from the conditional distribution according to the maximum probabilities; the maximum likelihood parameter generation (MLPG) algorithm [37] is adopted in this case to generate a static feature sequence for synthesizing speech (predicting an optimal/best feature realization and generating a feature sequence for synthesis));
generating the speech output based on, at least in part, the speech gap filling model and the best sequence of the plurality of features added to the interim set of parameters ([C. Spectral Postfiltering] [A. Voices and Methods] [2008] the maximum likelihood parameter generation (MLPG) algorithm [37] is adopted in this case to generate a static feature sequence for synthesizing speech.
The difference between the prior art and the claimed invention is that Chen does not explicitly teach features extracted to be added to an interim set of parameters for synthesis of a speech output; features added to the interim set of parameters.
Takamichi teaches features extracted to be added to an interim set of parameters for synthesis of a speech output; features added to the interim set of parameters ([A. Utterance-Level MS-Based Post-Filter] [pgs. 758-759] the following filter is applied to the generated speech parameter sequence y (see Fig. 7.); the filtered parameter sequence is calculated from the modified MS; the filtered speech parameter sequence is generated by overlapping and adding the filtered segments (applying/adding the predicted/filtered feature information to an already-generated/interim speech parameter sequence for synthesis)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Chen with teachings of Takamichi by modifying the deep generative architecture for postfiltering in statistical parametric speech synthesis as taught by Chen to include features extracted to be added to an interim set of parameters for synthesis of a speech output and features added to the interim set of parameters as taught by Takamichi for the benefit of yielding significant improvements in synthetic speech quality (Takamichi [Abstract]).
Claims 2, 10 and 18,
The computing system of claim 1, wherein generating the speech output comprises: generating the interim set of parameters ([C. Computational Cost] [pg. 2013] the mean vector of the spectral stream(mel-cepstrum) of each HMM state can be converted into multiple frames of spectra, and the DNN-based postfilter can be applied to the converted mean vectors);
processing the interim set of parameters based on, at least in part, the speech gap filling model to generate a final set of parameters ([C. Computational Cost] [pg. 2013] the DNN-based postfilter can also be applied to the model parameters of HMMs to accelerate the synthesis process. For example, the mean vector of the spectral stream(mel-cepstrum) of each HMM state can be converted into multiple frames of spectra, and the DNN-based postfilter can be applied to the converted mean vectors (interim/parameter processing by the DNN model to yield final parameters)); and
generating the speech output based on, at least in part, the final set of parameters ([C. Computational Cost] [pg. 2013] the computational cost of the synthesis process is exactly the same as that of the conventional method (NONE) (synthesis from the final/postfiltered parameter set)).
Claims 3, 11 and 19,
Chen further teaches the computing system of claim 1, wherein the synthetic speech data generated is based on, at least in part, at least one of a parametric acoustic model and a linguistic model pre-configured for a speaker ([Abstract] [A. Voices and Methods] [pg. 2008] hidden Markov model (HMM)-based statistical parametric speech synthesis; the male voice was created from a high-quality average voice model adapted to 2840 sentences recorded from a British male speaker; the female voice was built using 4546 sentences recorded from a Scottish female speaker; we used a hidden semi-Markov model as the acoustic model).
Claims 4, 12 and 20,
Chen further teaches the computing system of claim 1, wherein generating the speech output comprises using the gap filling model ([Abstract] [C. Spectral Postfiltering] [pg. 2007] the network models the conditional probability of the spectrum of natural speech given that of synthetic speech to compensate for such gap between synthetic and natural speech, (i.e. the claimed gap-filling model); the proposed DNN directly describes a conditional distribution of natural spectral feature given synthetic spectral feature);
Takamichi further teaches adjust a vector index to be applied prior to generating the speech output as synthesized speech ([III. Modulation Spectrum Analysis] [pg. 757] where f is a modulation frequency index, m=−πf/Ds is a modulation frequency, and Ds is one half of the Discrete Fourier Transform(DFT) length; the MS is calculated from zero-padded parameter sequences so its length is 2Ds; as shown in Fig.2, s(y) is given as a super vector consisting of the MSs corresponding to individual feature dimensions).
Claims 5 and 13,
Chen further teaches the computing system of claim 1, wherein the operations further comprise aligning the synthetic speech data and the recorded reference speech data preceding the comparison ([A. Voices and Methods] [Mel-cepstral domain] [Spectral domain] [pg. 2008] the DNNs were trained with paired synthetic and natural spectral features aligned using the dynamic time warping (DTW) algorithm; spectral envelopes of synthetic and natural speech were aligned using the alignment paths calculated from their corresponding mel-cepstra).
Claims 6 and 14,
Chen further teaches the computing system of claim 5, wherein aligning the synthetic speech data and the recorded reference speech data comprises implementing one or more of pitch shifting, time normalization, and time alignment between the synthetic speech data and the recorded reference speech data ([A. Voices and Methods] [Mel-cepstral domain] [Spectral domain] [pg. 2008] spectral envelopes of synthetic and natural speech were aligned using the alignment paths calculated from their corresponding mel-cepstra).
Claims 7 and 15,
Chen further teaches the computing system of claim 1, wherein the operations further comprise training a neural network based on, at least in part, at least one feature of the plurality of features to generate the speech gap filling model ([Abstract] [B. Model Training] [pg. 2006] the proposed probabilistic postfilter is generatively trained by cascading two restricted Boltzmann machines (RBMs) or deep belief networks (DBNs) with one bidirectional associative memory (BAM); the proposed DNN-based postfilter is generatively trained layer-by-layer by cascading two RBMs/DBNs with a BAM; these auto-encoders can encode the raw spectral features into high-level hidden binary representations; the hidden representations for the DBNs are extracted layer-by-layer as the binary code of the DBN auto-encoders; BAM is adopted in the third step to model the joint distribution of hidden variables from the two RBMs/DBNs; the concatenated model is then converted to a DNN (trains a neural network, and training is based on extracted hidden feature representations used to form the postfilter/gap-filling model)).
Claim(s) 8 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (“A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis”; IEEE 2015) in view of Takamichi et al. (“Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis”; IEEE 2016) and further in view of Senior et al. (US 8,527,276).
Claims 8 and 16,
Chen and Takamichi teach all the limitations in claim 1. The difference between the prior art and the claimed invention is that Chen nor Takamichi explicitly teach wherein the operations further comprise updating the speech gap filling model based on, at least in part, the at least one feature of the plurality of features.
Senior teaches wherein the operations further comprise updating the speech gap filling model based on, at least in part, the at least one feature of the plurality of features ([col. 17 line 59 to col. 18 line 31] the neural network training module 510 may function to compare the training-time predicted feature vectors 505 output by the neural network 506 with the target feature vector 509 obtained from the speech database 512 in order to update and/or adjust parameters (e.g., weights) of the neural network 506; through a process of repeated generation of training-time predicted feature vectors, such as training-time predicted feature vectors 505, comparison with target feature vectors, such as target feature vectors 509, and updating of the neural network 506, the neural network 506 (and correspondingly the acoustic parameter generation module 504) may be adjusted or trained to generate training-time predicted feature vectors that accurately represent the target feature vectors).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the teachings of Chen with teachings of Senior by modifying the deep generative architecture for postfiltering in statistical parametric speech synthesis as taught by Chen to include wherein the operations further comprise updating the speech gap filling model based on, at least in part, the at least one feature of the plurality of features as taught by Senior for the benefit of supporting speech synthesis system capabilities and services that may utilize speech synthesis system capabilities (Senior [Background]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Tachibana et al. (US 2017/0025115) – A singing voice synthesis data editing method includes adding, to singing voice synthesis data, a piece of virtual note data placed immediately before a piece of note data having no contiguous preceding piece of note data, the singing voice synthesis data including: multiple pieces of note data for specifying a duration and a pitch at which each note that is in a time series, representative of a melody to be sung, is voiced; multiple pieces of lyric data associated with at least one of the multiple pieces of note data; and a sequence of sound control data that directs sound control over a singing voice synthesized from the multiple pieces of lyric data, and obtaining the sound control data that directs sound control over the singing voice synthesized from the multiple pieces of lyric data, and that is associated with the piece of virtual note data.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHREYANS A PATEL whose telephone number is (571)270-0689. The examiner can normally be reached Monday-Friday 8am-5pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
SHREYANS A. PATEL
Primary Examiner
Art Unit 2653
/SHREYANS A PATEL/ Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/ Supervisory Patent Examiner, Art Unit 2659