DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3 and 6 -8 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1 – 3 and 6 -8 are rejected under 35 U.S.C. 101 because the claimed subject matter is directed to a judicial exception (an abstract idea) and does not recite additional elements that amount to significantly more than the exception.
The claims recite mathematical concepts and mental processes, including calculations, correlations, comparisons, selections, predictions, and optimization. Step 2A,prong 1.
For example, claim 1 recites: “generating a speech learning model for a plurality of users based on speech data of the plurality of users;” “generating a first speaker vector for speech data of a new speaker and a plurality of second speaker vectors for speech data of the plurality of users using a speaker recognition model;” “determining a third speaker vector having a highest correlation with the first speaker vector among the plurality of second speaker vectors based on a preset criterion;” “predicting a new speaker vector of the new user based on the third speaker vector and the first speaker vector using an adversarial training method;” “performing predicting based on a pronunciation duration time extracted from each of the speech data of the new speaker and speech data of a third speaker who is a speaker of the third speaker vector;” “jointly optimizing a pronunciation duration loss and a prosody feature loss including pitch, stress, and intonation;” “calculating a duration-weighted cosine similarity between the first speaker vector and each of the plurality of second speaker vectors.”
Claims 2 further recites “uses a feature vector extracted from the speech data of the new speaker.” Claim 3 further recites “calculating a cosine similarity value based on calculated inner product values and determining a speaker vector of a user, which has a greatest cosine similarity value among the plurality of users, to be the third speaker vector.”
The device claims similarly recite abstract calculations and functional results via generic components. Claim 6 recites: “a speech synthesizer which generates a speech learning model for a plurality of users based on speech data of the plurality of users;” “a speech vector generator which generates a first speaker vector for speech data of a new speaker and a plurality of second speaker vectors for speech data of the plurality of users using a speaker recognition model;” “a similar vector determiner which predicts a third speaker vector having a highest correlation with the first speaker vector among the plurality of second speaker vectors based on a preset criterion,” and further, “predicts a new speaker vector of the new user based on the third speaker vector and the first speaker vector using an adversarial training method,” “performing predicting based on a pronunciation duration time,” “jointly optimizing a pronunciation duration loss and a prosody feature loss including pitch, stress, and intonation,” and “calculating a duration-weighted cosine similarity between the first speaker vector and each of the plurality of second speaker vectors.”
Claim 7 recites “uses a feature vector extracted from the speech data of the new speaker.” Claim 8 recites “calculates a cosine similarity value based on calculated inner product values and determines a speaker vector of a user, which has a greatest cosine similarity value among the plurality of users, to be the third speaker vector.”
These limitations are directed to mathematical concepts (e.g., “cosine similarity,” “inner product values,” “highest correlation,” “jointly optimizing … loss”) and mental processes (e.g., selecting “a speaker vector … which has a greatest cosine similarity value”) implemented on a computer. See Alice Corp. v. CLS Bank Int’l, 573 U.S. 208 (2014); Parker v. Flook, 437 U.S. 584 (1978); Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350 (Fed. Cir. 2016).
The claims do not integrate the judicial exception into a practical application. Step 2A, prong 2. The additional elements are recitations of generic computing and machine learning functions and components, such as “using a speaker recognition model,” “using an adversarial training method,” “a speech synthesizer,” “a speech vector generator,” and “a similar vector determiner.” The claims do not recite a particular machine that is integral to the claim beyond a generic implementation, do not effect a transformation of an article, and do not improve the functioning of a computer or specific network architecture. The asserted improvements amount to the application of mathematical relationships and optimization to “speaker vectors” and “loss” functions, which is not a practical application under the Guidance. See Alice, 573 U.S. at 223–24; Mayo Collaborative Servs. v. Prometheus Labs., Inc., 566 U.S. 66 (2012); 2019 Revised Patent Subject Matter Eligibility Guidance.
The claims, considered individually and in combination, do not recite an inventive concept sufficient to transform the judicial exception into patent-eligible subject matter. Step 2B.
The recited operations—“generating a speech learning model,” “generating a first speaker vector,” “generating a plurality of second speaker vectors,” “determining a third speaker vector having a highest correlation,” “predicting a new speaker vector … using an adversarial training method,” “jointly optimizing a pronunciation duration loss and a prosody feature loss,” “calculating a duration-weighted cosine similarity,” and “calculating a cosine similarity value based on calculated inner product values”—constitute well-understood, routine, and conventional activities of data analysis and machine learning implemented on generic computing components. Merely applying mathematical calculations and optimization to “speech data,” “speaker vectors,” and “loss” terms does not provide an inventive concept. See Alice, 573 U.S. at 225–26; Mayo, 566 U.S. at 79–80; Flook, 437 U.S. at 594–95; Electric Power Group, 830 F.3d at 1354–56.
For the foregoing reasons, claims 1–3 and 6–8 are directed to an abstract idea and fail to recite additional elements that amount to significantly more than the abstract idea. Accordingly, claims 1– 3 and 6 – 8 are rejected under 35 U.S.C. § 101.
To overcome the 101 rejection the following amendment to the claims are suggested:
Claim 1: (to show transformation from one thing to another)
A computer-implemented method of synthesizing a multi-speaker speech using an artificial neural network, the method executed by a processor and comprising:
generating a speech learning model for a plurality of users based on speech data of the plurality of users using a multi-speaker speech synthesis model comprising an encoder, an attention module, a decoder, and a vocoder [0074]-[0077],[0079];
generating a first speaker vector for the new speaker by encoding the extracted d-vector and retrieving a plurality of second speaker vectors for the plurality of users from a trained speaker vector table [0085]-[0086],[0102]-[0103]; …
And/or (Machine state change and training)
training the initial embedding predictor and an adversarial discriminator in an adversarial training arrangement by updating network weights based on a reconstruction loss between an actual speaker vector and a predicted speaker vector and a discriminator loss, and storing the updated trained network parameters in non-transitory memory [0115]-[0124],[0125]; …
Similarly, Claim 6:
A device for synthesizing a multi-speaker speech using an artificial neural network, the device comprising:
one or more processors and one or more tangible, non-transitory memory devices storing instructions that, when executed by the one or more processors, cause the device to:
train a multi-speaker speech synthesis model that comprises an encoder, an attention module, a decoder, and a vocoder to generate and store trained speaker embedding vectors in a speaker vector table for a plurality of users [0074]-[0077],[0085]-[0086]; …
These sample languages are simply suggested to the applicant to overcome the 101 (abstract idea) rejection. And the applicant is responsible to check the specification for new matter and lack of antecedent basis in the claims.
Allowable Subject Matter
Claims 1 – 3 and 6 -8 are allowable over the prior art of record.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See the new attached PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHEMOND DORVIL whose telephone number is (571)272-7602. The examiner can normally be reached 8:30 - 5:30 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RICHEMOND DORVIL/ Supervisory Patent Examiner, Art Unit 2658