Prosecution Insights
Last updated: April 19, 2026
Application No. 17/898,991

METHOD AND DEVICE FOR SYNTHESIZING MULTI-SPEAKER SPEECH USING ADVERSARIAL ARTIFICIAL NEURAL NETWORK

Non-Final OA §101
Filed
Aug 30, 2022
Examiner
DORVIL, RICHEMOND
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Industry-University Cooperation Foundation Hanyang University
OA Round
3 (Non-Final)
22%
Grant Probability
At Risk
3-4
OA Rounds
3y 0m
To Grant
48%
With Interview

Examiner Intelligence

Grants only 22% of cases
22%
Career Allow Rate
11 granted / 49 resolved
-39.6% vs TC avg
Strong +26% interview lift
Without
With
+25.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
12 currently pending
Career history
61
Total Applications
across all art units

Statute-Specific Performance

§101
16.4%
-23.6% vs TC avg
§103
46.3%
+6.3% vs TC avg
§102
14.4%
-25.6% vs TC avg
§112
17.0%
-23.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 49 resolved cases

Office Action

§101
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Arguments Applicant’s arguments with respect to claim(s) 1-3 and 6 -8 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1 – 3 and 6 -8 are rejected under 35 U.S.C. 101 because the claimed subject matter is directed to a judicial exception (an abstract idea) and does not recite additional elements that amount to significantly more than the exception. The claims recite mathematical concepts and mental processes, including calculations, correlations, comparisons, selections, predictions, and optimization. Step 2A,prong 1. For example, claim 1 recites: “generating a speech learning model for a plurality of users based on speech data of the plurality of users;” “generating a first speaker vector for speech data of a new speaker and a plurality of second speaker vectors for speech data of the plurality of users using a speaker recognition model;” “determining a third speaker vector having a highest correlation with the first speaker vector among the plurality of second speaker vectors based on a preset criterion;” “predicting a new speaker vector of the new user based on the third speaker vector and the first speaker vector using an adversarial training method;” “performing predicting based on a pronunciation duration time extracted from each of the speech data of the new speaker and speech data of a third speaker who is a speaker of the third speaker vector;” “jointly optimizing a pronunciation duration loss and a prosody feature loss including pitch, stress, and intonation;” “calculating a duration-weighted cosine similarity between the first speaker vector and each of the plurality of second speaker vectors.” Claims 2 further recites “uses a feature vector extracted from the speech data of the new speaker.” Claim 3 further recites “calculating a cosine similarity value based on calculated inner product values and determining a speaker vector of a user, which has a greatest cosine similarity value among the plurality of users, to be the third speaker vector.” The device claims similarly recite abstract calculations and functional results via generic components. Claim 6 recites: “a speech synthesizer which generates a speech learning model for a plurality of users based on speech data of the plurality of users;” “a speech vector generator which generates a first speaker vector for speech data of a new speaker and a plurality of second speaker vectors for speech data of the plurality of users using a speaker recognition model;” “a similar vector determiner which predicts a third speaker vector having a highest correlation with the first speaker vector among the plurality of second speaker vectors based on a preset criterion,” and further, “predicts a new speaker vector of the new user based on the third speaker vector and the first speaker vector using an adversarial training method,” “performing predicting based on a pronunciation duration time,” “jointly optimizing a pronunciation duration loss and a prosody feature loss including pitch, stress, and intonation,” and “calculating a duration-weighted cosine similarity between the first speaker vector and each of the plurality of second speaker vectors.” Claim 7 recites “uses a feature vector extracted from the speech data of the new speaker.” Claim 8 recites “calculates a cosine similarity value based on calculated inner product values and determines a speaker vector of a user, which has a greatest cosine similarity value among the plurality of users, to be the third speaker vector.” These limitations are directed to mathematical concepts (e.g., “cosine similarity,” “inner product values,” “highest correlation,” “jointly optimizing … loss”) and mental processes (e.g., selecting “a speaker vector … which has a greatest cosine similarity value”) implemented on a computer. See Alice Corp. v. CLS Bank Int’l, 573 U.S. 208 (2014); Parker v. Flook, 437 U.S. 584 (1978); Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350 (Fed. Cir. 2016). The claims do not integrate the judicial exception into a practical application. Step 2A, prong 2. The additional elements are recitations of generic computing and machine learning functions and components, such as “using a speaker recognition model,” “using an adversarial training method,” “a speech synthesizer,” “a speech vector generator,” and “a similar vector determiner.” The claims do not recite a particular machine that is integral to the claim beyond a generic implementation, do not effect a transformation of an article, and do not improve the functioning of a computer or specific network architecture. The asserted improvements amount to the application of mathematical relationships and optimization to “speaker vectors” and “loss” functions, which is not a practical application under the Guidance. See Alice, 573 U.S. at 223–24; Mayo Collaborative Servs. v. Prometheus Labs., Inc., 566 U.S. 66 (2012); 2019 Revised Patent Subject Matter Eligibility Guidance. The claims, considered individually and in combination, do not recite an inventive concept sufficient to transform the judicial exception into patent-eligible subject matter. Step 2B. The recited operations—“generating a speech learning model,” “generating a first speaker vector,” “generating a plurality of second speaker vectors,” “determining a third speaker vector having a highest correlation,” “predicting a new speaker vector … using an adversarial training method,” “jointly optimizing a pronunciation duration loss and a prosody feature loss,” “calculating a duration-weighted cosine similarity,” and “calculating a cosine similarity value based on calculated inner product values”—constitute well-understood, routine, and conventional activities of data analysis and machine learning implemented on generic computing components. Merely applying mathematical calculations and optimization to “speech data,” “speaker vectors,” and “loss” terms does not provide an inventive concept. See Alice, 573 U.S. at 225–26; Mayo, 566 U.S. at 79–80; Flook, 437 U.S. at 594–95; Electric Power Group, 830 F.3d at 1354–56. For the foregoing reasons, claims 1–3 and 6–8 are directed to an abstract idea and fail to recite additional elements that amount to significantly more than the abstract idea. Accordingly, claims 1– 3 and 6 – 8 are rejected under 35 U.S.C. § 101. To overcome the 101 rejection the following amendment to the claims are suggested: Claim 1: (to show transformation from one thing to another) A computer-implemented method of synthesizing a multi-speaker speech using an artificial neural network, the method executed by a processor and comprising: generating a speech learning model for a plurality of users based on speech data of the plurality of users using a multi-speaker speech synthesis model comprising an encoder, an attention module, a decoder, and a vocoder [0074]-[0077],[0079]; generating a first speaker vector for the new speaker by encoding the extracted d-vector and retrieving a plurality of second speaker vectors for the plurality of users from a trained speaker vector table [0085]-[0086],[0102]-[0103]; … And/or (Machine state change and training) training the initial embedding predictor and an adversarial discriminator in an adversarial training arrangement by updating network weights based on a reconstruction loss between an actual speaker vector and a predicted speaker vector and a discriminator loss, and storing the updated trained network parameters in non-transitory memory [0115]-[0124],[0125]; … Similarly, Claim 6: A device for synthesizing a multi-speaker speech using an artificial neural network, the device comprising: one or more processors and one or more tangible, non-transitory memory devices storing instructions that, when executed by the one or more processors, cause the device to: train a multi-speaker speech synthesis model that comprises an encoder, an attention module, a decoder, and a vocoder to generate and store trained speaker embedding vectors in a speaker vector table for a plurality of users [0074]-[0077],[0085]-[0086]; … These sample languages are simply suggested to the applicant to overcome the 101 (abstract idea) rejection. And the applicant is responsible to check the specification for new matter and lack of antecedent basis in the claims. Allowable Subject Matter Claims 1 – 3 and 6 -8 are allowable over the prior art of record. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See the new attached PTO-892. Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHEMOND DORVIL whose telephone number is (571)272-7602. The examiner can normally be reached 8:30 - 5:30 M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /RICHEMOND DORVIL/ Supervisory Patent Examiner, Art Unit 2658
Read full office action

Prosecution Timeline

Aug 30, 2022
Application Filed
Nov 26, 2024
Non-Final Rejection — §101
Mar 04, 2025
Response Filed
Jun 04, 2025
Final Rejection — §101
Sep 11, 2025
Response after Non-Final Action
Oct 10, 2025
Request for Continued Examination
Oct 16, 2025
Response after Non-Final Action
Jan 17, 2026
Non-Final Rejection — §101 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12591738
Autocorrect Candidate Selection
2y 5m to grant Granted Mar 31, 2026
Patent 12573397
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
2y 5m to grant Granted Mar 10, 2026
Patent 12567401
EVALUATING RELIABILITY OF AUDIO DATA FOR USE IN SPEECH PROCESSING
2y 5m to grant Granted Mar 03, 2026
Patent 12547849
ABSTRACTIVE SUMMARIZATION OF INFORMATION TECHNOLOGY ISSUES USING A METHOD OF GENERATING COMPARATIVES
2y 5m to grant Granted Feb 10, 2026
Patent 12505853
SIGNAL PROCESSING DEVICE AND METHOD
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
22%
Grant Probability
48%
With Interview (+25.6%)
3y 0m
Median Time to Grant
High
PTA Risk
Based on 49 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month