Prosecution Insights
Last updated: April 19, 2026
Application No. 18/631,614

GENERATING SYNTHETIC VOICES FOR CONVERSATIONAL SYSTEMS AND APPLICATIONS

Non-Final OA §102§112
Filed
Apr 10, 2024
Examiner
HOQUE, NAFIZ E
Art Unit
2693
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
75%
Grant Probability
Favorable
1-2
OA Rounds
3y 1m
To Grant
99%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
456 granted / 608 resolved
+13.0% vs TC avg
Strong +24% interview lift
Without
With
+23.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
20 currently pending
Career history
628
Total Applications
across all art units

Statute-Specific Performance

§101
11.5%
-28.5% vs TC avg
§103
42.7%
+2.7% vs TC avg
§102
23.6%
-16.4% vs TC avg
§112
11.3%
-28.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 608 resolved cases

Office Action

§102 §112
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 7 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the enablement requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. Claim 1 recites “[a] system comprising: one or more processor to... generate... synthetic audio data representative of speech..." and claim 7 recites “[the] system of claim 1, wherein the system is comprised in at least one of:” and then proceeds to list 17 different systems. However, the specification does not describe how the speech system is being used or applied to each of the different context. For example, digital twin operations, transport simulation, creation of 3D assets and etc. It is not clear how a speech system would be used in such systems. Furthermore, it also lacks written description support for being used in more than one system. Since claim 7 recites “at least one of”, then the speech system can be used in in-vehicle infotainment system and digital twin operations and on a robot, simultaneously. The specification does disclose how the speech system exists in multiple contexts simultaneously and how it even works in multiple complex systems. It may be possible for some combination but it’s not clear how it is possible for all combinations. Claim 20 is rejected for similar reasons as claim 7. Appropriate correction is required. The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claim 7 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Similarly, as above, claim 7 recites “[the] system of claim 1, wherein the system is comprised in at least one of:” and then proceeds to list 17 different systems. It is not clear how more than one or all the systems would be simultaneously comprised in multiple, seemingly incompatible environments. As an example, how would the speech system be used in in-vehicle infotainment system and digital twin operations and on a robot, simultaneously. A person of ordinary skill in the art would struggle to determine the metes and bounds of the claim. Furthermore, some systems have further limitation of “at least one” which creates more complexity. For example, “at least one of virtual reality content, mixed reality content, or augmented reality content”. The claims are unclear how the speech systems can be comprised all three and then furthermore in other systems such as “performing light transport simulation”. Claim 20 is rejected for similar reasons as claim 7. Appropriate correction is required. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-5, 7-13, and 15-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Arik et al. (US Pub 2019/0251952). Regarding claim 1, Arik discloses a system comprising: one or more processors to: obtain one or more first speaker embeddings corresponding to one or more speaker voices (para 0032, 0066, 0072); determine, based at least on the one or more first speaker embeddings, one or more second speaker embeddings corresponding to one or more synthetic voices (para 0153-0159; also see para 0069-80 – both teaches operations on original embeddings to create a new embedding); and generate, using the one or more second speaker embeddings and based at least on input data representative of linguistic content, synthetic audio data representative of speech corresponding to the linguistic content (para 0076, 0082, 0101). Regarding claim 2, Arik discloses wherein the one or more processors are further to: generate an embedding space based at least on the one or more first speaker embeddings (para 0066, para 0152-0153, see fig. 23), wherein of the one or more processors are to determine the one or more second speaker embeddings by sampling the embedding space to identify the one or more second speaker embeddings (para 0152-0153, 0159; see fig. 23). Regarding claim 3, Arik discloses wherein the one or more processors are further to: obtain one or more third speaker embeddings corresponding to one or more third voices (para 0155 – such as male or female speakers); wherein one or more processors are to determine the one or more second speaker embeddings based at least on interpolating between the one or more first speaker embeddings and the one or more third speaker embeddings (para 0155-0157). Regarding claim 4, Arik discloses wherein the one or more processors are further to: determine one or more first weights associated with the one or more first speaker embeddings and one or more second weights associated with the one or more second speaker embeddings (para 0155, 0106), wherein the one or more processors are further to determine the one or more second speaker embeddings based at least on the one or more first weights and the one or more second weights (para 0154-0157 – weights are 1 for speaker and +-1 for manipulation). Regarding claim 5, Arik discloses wherein the one or more processors are further to: determine one or more frequency values corresponding to the one or more synthetic voices (para 0059 – pitch is part of frequency; 0107, 0215), wherein the one or more processors are further to determine the one or more second speaker embeddings based at least on the one or more frequency values (para 0159). Regarding claim 7, Arik discloses wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing one or more simulation operations; a system for performing one or more digital twin operations; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; a system for performing one or more deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for performing one or more generative AI operations (see abstract – generate synthetic speech; see claim 7 where it states “A generative text-to-speech system comprising”); a system for performing operations using a large language model; a system for performing one or more conversational AI operations; a system for generating synthetic data (see abstract – generate synthetic speech; see claim 7 where it states “A generative text-to-speech system comprising”; para 0154-0157); a system for presenting at least one of virtual reality content, augmented reality content, or mixed reality content; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. Regarding claim 8, see rejection of claim 1 (see audio features such as gender, pitch or accents (para 0059, 0125, 0153-0157)). Regarding claim 9, Arik discloses wherein: the first data representative of the one or more first audio features comprises one or more first speaker embeddings corresponding to the one or more speaker voices (para 0032, 0066, 0072); and the second data representative of the one or more second audio features comprises one or more second speaker embeddings corresponding to the one or more synthetic voices, the one or more second speaker embeddings being different than the one or more first speaker embeddings (para 0153-0159; also see para 0069-80 – both teaches operations on original embeddings to create a new embedding). Regarding claim 10, see rejection of claim 2. Regarding claim 11, see rejection of claim 3. Regarding claim 12, see rejection of claim 4. Regarding claim 13, see rejection of claim 5. Regarding claim 15, Arik discloses further comprising: determining at least one of a first value for a mean associated with a distribution corresponding to the one or more first audio features (para 0105, 0116) or a second value for a standard deviation associated with the distribution (para 0135 - mean absolute error), wherein the determining the second data is further based at least on the at least one of the first value or the second value (see para 0086, 0105; see fig. 12). Regarding claim 16, Arik discloses further comprising: determining one or more speaker types associated with the one or more second audio features (para 0066 – such as gender; para 0153-057, 0183), wherein the determining the second data is further based at least on the one or more speaker types (see figs. 23-24; para 0153-057, 0183). Regarding claim 17, Arik discloses wherein: the one or more first audio features comprise one or more of: one or more first speaker embeddings (para 0066); one or more first frequency values, one or more first intensity values; one or more first accents (para 0066); one or more first rates; or one or more first tones; and the one or more second audio features comprise one or more of: one or more second speaker embeddings (para 0066); one or more second frequency values, one or more second intensity values; one or more second accents (para 0066); one or more second rates; or one or more second tones. Regarding claim 18, Arik discloses further comprising: generating, using one or more encoders and based at least on the audio data, one or more speaker embeddings (para 0088-0089); and storing, based at least on verifying the audio data using the one or more speaker embeddings, the audio data as part of a dataset for training one or more machine learning models (para 0093-0100). Regarding claim 18, Arik discloses a processor comprising: one or more processing units to generate synthetic audio data using a first speaker embedding (para 0066, 0095; see fig. 8 and 9) and a frequency value associated with a synthetic voice (para 0104, 0107, 0135; also see para 0060), wherein the first speaker embedding is determined based at least on one or more second speaker embeddings (para 0153-0159; also see para 0069-80 – both teaches operations on original embeddings to create a new embedding) and the frequency value is determined based at least on a distribution of frequency values (para 0059 – pitch range, 0064, 0135 – model learns frequency/spectrogram features from training data distribution). Regarding claim 20, see rejection of claim 7. Allowable Subject Matter Claims 6 and 14 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAFIZ E HOQUE whose telephone number is (571)270-1811. The examiner can normally be reached M-F 8-5. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached at (571)272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /NAFIZ E HOQUE/ Primary Examiner, Art Unit 2693
Read full office action

Prosecution Timeline

Apr 10, 2024
Application Filed
Jan 04, 2026
Non-Final Rejection — §102, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12581017
COMMUNICATION ROUTING FOR CONTACT CENTER
2y 5m to grant Granted Mar 17, 2026
Patent 12579363
Incentive Aware-Aggregation Of Generative Models
2y 5m to grant Granted Mar 17, 2026
Patent 12573372
TEXT-TO-SPEECH SYSTEM WITH VARIABLE FRAME RATE
2y 5m to grant Granted Mar 10, 2026
Patent 12574459
VOICE-SYNCHRONIZED VISUAL INTERFACE
2y 5m to grant Granted Mar 10, 2026
Patent 12547848
One-Shot Visual Language Reasoning Over Graphical Depictions of Data
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
99%
With Interview (+23.7%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 608 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month