Last updated: April 19, 2026

Application No. 17/898,991

METHOD AND DEVICE FOR SYNTHESIZING MULTI-SPEAKER SPEECH USING ADVERSARIAL ARTIFICIAL NEURAL NETWORK

Non-Final OA §101

Filed

Aug 30, 2022

Examiner

DORVIL, RICHEMOND

Art Unit

2658

Tech Center

2600 — Communications

Assignee

Industry-University Cooperation Foundation Hanyang University

OA Round

3 (Non-Final)

This examiner grants 22% of cases after interview

— +25.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 49 resolved cases, 2023–2026

Examiner Intelligence

DORVIL, RICHEMOND View full profile →

Grants only 22% of cases

Career Allow Rate

11 granted / 49 resolved

-39.6% vs TC avg

Strong +26% interview lift

Without

With

+25.6%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

12 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

16.4%

-23.6% vs TC avg

§103

46.3%

+6.3% vs TC avg

§102

14.4%

-25.6% vs TC avg

§112

17.0%

-23.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 49 resolved cases

Office Action

§101

DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3 and 6 -8 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 – 3 and 6 -8 are rejected under 35 U.S.C. 101 because the claimed subject matter is directed to a judicial exception (an abstract idea) and does not recite additional elements that amount to significantly more than the exception.
The claims recite mathematical concepts and mental processes, including calculations, correlations, comparisons, selections, predictions, and optimization. Step 2A,prong 1.
 For example, claim 1 recites: “generating a speech learning model for a plurality of users based on speech data of the plurality of users;” “generating a first speaker vector for speech data of a new speaker and a plurality of second speaker vectors for speech data of the plurality of users using a speaker recognition model;” “determining a third speaker vector having a highest correlation with the first speaker vector among the plurality of second speaker vectors based on a preset criterion;” “predicting a new speaker vector of the new user based on the third speaker vector and the first speaker vector using an adversarial training method;” “performing predicting based on a pronunciation duration time extracted from each of the speech data of the new speaker and speech data of a third speaker who is a speaker of the third speaker vector;” “jointly optimizing a pronunciation duration loss and a prosody feature loss including pitch, stress, and intonation;” “calculating a duration-weighted cosine similarity between the first speaker vector and each of the plurality of second speaker vectors.” 
Claims 2 further recites “uses a feature vector extracted from the speech data of the new speaker.” Claim 3 further recites “calculating a cosine similarity value based on calculated inner product values and determining a speaker vector of a user, which has a greatest cosine similarity value among the plurality of users, to be the third speaker vector.”
The device claims similarly recite abstract calculations and functional results via generic components. Claim 6 recites: “a speech synthesizer which generates a speech learning model for a plurality of users based on speech data of the plurality of users;” “a speech vector generator which generates a first speaker vector for speech data of a new speaker and a plurality of second speaker vectors for speech data of the plurality of users using a speaker recognition model;” “a similar vector determiner which predicts a third speaker vector having a highest correlation with the first speaker vector among the plurality of second speaker vectors based on a preset criterion,” and further, “predicts a new speaker vector of the new user based on the third speaker vector and the first speaker vector using an adversarial training method,” “performing predicting based on a pronunciation duration time,” “jointly optimizing a pronunciation duration loss and a prosody feature loss including pitch, stress, and intonation,” and “calculating a duration-weighted cosine similarity between the first speaker vector and each of the plurality of second speaker vectors.”
Claim 7 recites “uses a feature vector extracted from the speech data of the new speaker.” Claim 8 recites “calculates a cosine similarity value based on calculated inner product values and determines a speaker vector of a user, which has a greatest cosine similarity value among the plurality of users, to be the third speaker vector.”
These limitations are directed to mathematical concepts (e.g., “cosine similarity,” “inner product values,” “highest correlation,” “jointly optimizing … loss”) and mental processes (e.g., selecting “a speaker vector … which has a greatest cosine similarity value”) implemented on a computer. See Alice Corp. v. CLS Bank Int’l, 573 U.S. 208 (2014); Parker v. Flook, 437 U.S. 584 (1978); Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350 (Fed. Cir. 2016).
The claims do not integrate the judicial exception into a practical application. Step 2A, prong 2. The additional elements are recitations of generic computing and machine learning functions and components, such as “using a speaker recognition model,” “using an adversarial training method,” “a speech synthesizer,” “a speech vector generator,” and “a similar vector determiner.” The claims do not recite a particular machine that is integral to the claim beyond a generic implementation, do not effect a transformation of an article, and do not improve the functioning of a computer or specific network architecture. The asserted improvements amount to the application of mathematical relationships and optimization to “speaker vectors” and “loss” functions, which is not a practical application under the Guidance. See Alice, 573 U.S. at 223–24; Mayo Collaborative Servs. v. Prometheus Labs., Inc., 566 U.S. 66 (2012); 2019 Revised Patent Subject Matter Eligibility Guidance.
The claims, considered individually and in combination, do not recite an inventive concept sufficient to transform the judicial exception into patent-eligible subject matter. Step 2B. 
The recited operations—“generating a speech learning model,” “generating a first speaker vector,” “generating a plurality of second speaker vectors,” “determining a third speaker vector having a highest correlation,” “predicting a new speaker vector … using an adversarial training method,” “jointly optimizing a pronunciation duration loss and a prosody feature loss,” “calculating a duration-weighted cosine similarity,” and “calculating a cosine similarity value based on calculated inner product values”—constitute well-understood, routine, and conventional activities of data analysis and machine learning implemented on generic computing components. Merely applying mathematical calculations and optimization to “speech data,” “speaker vectors,” and “loss” terms does not provide an inventive concept. See Alice, 573 U.S. at 225–26; Mayo, 566 U.S. at 79–80; Flook, 437 U.S. at 594–95; Electric Power Group, 830 F.3d at 1354–56.
For the foregoing reasons, claims 1–3 and 6–8 are directed to an abstract idea and fail to recite additional elements that amount to significantly more than the abstract idea. Accordingly, claims 1– 3 and 6 – 8 are rejected under 35 U.S.C. § 101.
To overcome the 101 rejection the following amendment to the claims are suggested: 
Claim 1:  (to show transformation from one thing to another)
A computer-implemented method of synthesizing a multi-speaker speech using an artificial neural network, the method executed by a processor and comprising:
  generating a speech learning model for a plurality of users based on speech data of the plurality of users using a multi-speaker speech synthesis model comprising an encoder, an attention module, a decoder, and a vocoder [0074]-[0077],[0079];
generating a first speaker vector for the new speaker by encoding the extracted d-vector and retrieving a plurality of second speaker vectors for the plurality of users from a trained speaker vector table [0085]-[0086],[0102]-[0103]; …
And/or (Machine state change and training)
training the initial embedding predictor and an adversarial discriminator in an adversarial training arrangement by updating network weights based on a reconstruction loss between an actual speaker vector and a predicted speaker vector and a discriminator loss, and storing the updated trained network parameters in non-transitory memory [0115]-[0124],[0125]; …
Similarly, Claim 6: 
A device for synthesizing a multi-speaker speech using an artificial neural network, the device comprising:
one or more processors and one or more tangible, non-transitory memory devices storing instructions that, when executed by the one or more processors, cause the device to:
    	train a multi-speaker speech synthesis model that comprises an encoder, an attention module, a decoder, and a vocoder to generate and store trained speaker embedding vectors in a speaker vector table for a plurality of users [0074]-[0077],[0085]-[0086]; …
These sample languages are simply suggested to the applicant to overcome the 101 (abstract idea) rejection. And the applicant is responsible to check the specification for new matter and lack of antecedent basis in the claims.

Allowable Subject Matter
Claims 1 – 3 and 6 -8  are allowable over the prior art of record. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See the new attached PTO-892.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RICHEMOND DORVIL whose telephone number is (571)272-7602. The examiner can normally be reached 8:30 - 5:30 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RICHEMOND DORVIL/               Supervisory Patent Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Aug 30, 2022

Application Filed

Nov 26, 2024

Non-Final Rejection — §101

Mar 04, 2025

Response Filed

Jun 04, 2025

Final Rejection — §101

Sep 11, 2025

Response after Non-Final Action

Oct 10, 2025

Request for Continued Examination

Oct 16, 2025

Response after Non-Final Action

Jan 17, 2026

Non-Final Rejection — §101 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/173,402

Patent 12591738

Autocorrect Candidate Selection

2y 5m to grant Granted Mar 31, 2026

18/461,095

Patent 12573397

ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF

2y 5m to grant Granted Mar 10, 2026

18/301,064

Patent 12567401

EVALUATING RELIABILITY OF AUDIO DATA FOR USE IN SPEECH PROCESSING

2y 5m to grant Granted Mar 03, 2026

18/447,506

Patent 12547849

ABSTRACTIVE SUMMARIZATION OF INFORMATION TECHNOLOGY ISSUES USING A METHOD OF GENERATING COMPARATIVES

2y 5m to grant Granted Feb 10, 2026

18/005,801

Patent 12505853

SIGNAL PROCESSING DEVICE AND METHOD

2y 5m to grant Granted Dec 23, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

22%

Grant Probability

48%

With Interview (+25.6%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 49 resolved cases by this examiner. Grant probability derived from career allow rate.