Last updated: May 29, 2026

Application No. 18/349,716

MULTI-LINGUAL AUTOMATIC SPEECH RECOGNITION FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Non-Final OA §101

Filed

Jul 10, 2023

Examiner

DESIR, PIERRE LOUIS

Art Unit

2659

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

2 (Non-Final)

This examiner grants 61% of cases after interview

— +32.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 288 resolved cases, 2023–2026

Examiner Intelligence

DESIR, PIERRE LOUIS View full profile →

Grants 61% of resolved cases

Career Allowance Rate

176 granted / 288 resolved

-0.9% vs TC avg

Strong +32% interview lift

Without

With

+32.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 11m

Avg Prosecution

3 currently pending

Career history

296

Total Applications

across all art units

Statute-Specific Performance

§101

4.5%

-35.5% vs TC avg

§103

75.0%

+35.0% vs TC avg

§102

11.5%

-28.5% vs TC avg

§112

4.3%

-35.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 288 resolved cases

Office Action

§101

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see pages 13-17 of Applicant Remarks, filed on 08/29/2025, with respect to 102/103 rejections have been fully considered and are persuasive.  The 102 and 103 rejections have been been withdrawn. 
With respect to the U.S.C. § 101 35, the arguments have been fully considered but they are not persuasive.
 Applicant argues (1) the claims do not recite a judicial exception, (2) even if they do, the claims integrate any judicial exception into a practical application (Step 2A Prong Two), and (3) in the alternative, the claims recite an inventive concept (Alice Step 2B) because they purportedly improve automatic speech recognition (ASR) by using a diffusion process that removes noise.
The rejection is maintained while respectfully disagrees with applicant. The claims (i) recite judicial exceptions (mental processes and mathematical concepts), (ii) do not integrate those exceptions into a practical application, and (iii) do not recite an inventive concept that transforms the abstract idea into patent-eligible subject matter. The analysis follows the 2019 PEG (Revised Step 2A Prong One and Two) and Step 2B and is consistent with MPEP § 2106.
I. The Claims Recite Judicial Exceptions (2019 PEG — Step 2A, Prong One)
Applicant’s argument that the claims do not fall within the judicial exception categories is not persuasive.
The claims recite a sequence of algorithmic data-processing steps: generating vector representations of speech, producing text outputs, producing a confidence value (probability), conditionally invoking further processing when a threshold is met, adding noise to a vector/text representation, calculating a loss by comparison of textual outputs, and updating learnable parameters based on the loss. These steps are mathematical concepts (vector manipulation, probability, noise addition, loss calculation, optimization of parameters) and mental processes (evaluating confidence, comparing outputs, deciding whether to reprocess). 
The fact that the claimed steps are performed by “ASR models” or “processing units” does not alter their character as mathematical/algorithmic operations. Recitation of algorithmic steps implemented on generic computers still falls within the judicial exceptions when those steps are mathematical relationships or mental processes (see Alice; Federal Circuit cases interpreting Alice).
Therefore, the claims recite judicial exceptions (mathematical concepts and mental processes).
II. The Claims Do Not Integrate the Judicial Exceptions into a Practical Application.
Applicant asserts that the claimed “practical application” is generating a textual representation of speech data and that paragraphs [0028]-[0029] of the specification show tangible benefits (removing noise) and that the claims thus integrate the judicial exception into a practical application. This argument is not persuasive for the reasons below. 
	The 2019 PEG requires more than merely applying an abstract idea to produce a conventional result (e.g., producing text output). A claim integrates a judicial exception into a practical application when it imposes a meaningful limit such that the claim is more than drafting the judicial exception. Merely reciting the application of an algorithm to input data to produce an output (here, text) is not sufficient. 
	Claim language ties the abstract steps to generic models and generic computing steps (e.g., “first ASR model,” “second ASR model,” “one or more processing units,” “vector representations,” “adding noise,” “calculating loss,” “modifying parameters”). These are generic computer components and routine ML techniques. The claims do not recite a specific technical means, architecture, data structure, or unconventional integration that meaningfully limits the abstract idea or provides a technological improvement in the functioning of the computer itself. Applicant’s citation to specification paragraphs [0028]-[0029] is insufficient without claim recitation or evidence showing how the claimed steps produce a particular technological improvement in ASR beyond the abstract idea. General statements in the specification about benefits do not substitute for claim limitations that tie the improvement to specific, non-generic elements. The claims themselves must set forth the improvement or the Applicant must present evidence that the claimed combination produces a technical effect. 
	The claimed steps of adding noise and denoising via a diffusion/GAN architecture, calculating loss, and updating parameters are standard classes of ML operations. Absent claim limitations that describe a specific novel architecture, an unconventional training schedule, a particular data flow, specialized hardware, or demonstrable metrics showing a measurable improvement tied to claim features, the claims do not integrate the abstract idea into a practical application as required by the PEG.
Accordingly, Step 2A Prong Two is not satisfied: the claims, taken as a whole, do not integrate the recited judicial exceptions into a practical application.
III. Claims Lack an Inventive Concept
Applicant alternatively contends that the claims meet Step 2B because they improve ASR by learning to remove noise using a diffusion process “which was not present in the field previously.” This assertion is insufficient for at least the following reasons:              Applicant provides no claim limitations that define how the diffusion process is implemented in a non-conventional manner or that tie the diffusion-based technique to a specific machine implementation or a concrete technical improvement. Merely naming known algorithm classes (e.g., diffusion model, GAN) does not convert an abstract idea into a patent-eligible invention absent additional unconventional technical detail. 
Applicant’s assertion that the diffusion process “was not present in the field previously” is a factual claim.
IV. Dependent Claims and System/Field-of-Use Limitations
Applicant cites additional limitations in dependent claims and system/field-of-use statements (e.g., claim 20’s list of systems). These do not confer patent eligibility when they merely recite environments in which the abstract idea may be used. Field-of-use or environment recitations that add only conventional computer or network components do not transform an abstract idea into patent-eligible subject matter.
For the reasons stated above, Applicant’s traversal is not persuasive. Claims 1–20 remain rejected under 35 U.S.C. § 101 as directed to judicial exceptions (mental processes and mathematical concepts) that are not integrated into a practical application and that do not recite an inventive concept sufficient to transform the abstract ideas into patent-eligible subject matter. 



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. This judicial exception is not integrated into a practical application because the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception 
 	Claims 1–20 are rejected under 35 U.S.C. § 101 because the claimed subject matter is directed to judicial exceptions (mental processes and mathematical relationships) without significantly more to integrate the exceptions into a patent-eligible practical application, and the claims lack an inventive concept.
Independent Claim 1:
“Generating, using a first automatic speech recognition (ASR) model comprising an unsupervised diffusion generative adversarial network model and based at least on a first vector representation of first speech data, a first text output and a confidence value indicating a probability associated with the first text output representing the first speech data;”
Abstract idea grouping: Mathematical relationships (vector representations, probability values) and mental processes (deriving text from speech, evaluating probability of correctness).
This step describes applying mathematical algorithms to input data (speech) to produce text and a confidence score — activities that can be performed mentally or with pen and paper (e.g., transcribing speech and estimating confidence), but here implemented on a generic computer.
“Based at least on the confidence value satisfying a threshold criterion, generating, using a second ASR model and based at least on a second vector representation of the first speech data and a third vector representation of the first text output, a second text output, wherein the second ASR model adds noise to the third vector representation of the first text output to obtain a noisy vector representation of the first text output;”
Abstract idea grouping: Mathematical relationships (vector manipulation, adding noise) and mental processes (deciding whether to process further based on confidence threshold).
Adding “noise” is a mathematical transformation of data. Conditional processing based on a threshold is a logical decision step — a mental process.
“Calculating a first loss of the second ASR model based at least on a comparison between the second text output and the first text output, the first loss indicating an ability of the second ASR model to remove noise from the noisy vector representation of the first text output;”
Abstract idea grouping: Mathematical relationships (loss calculation, comparison metrics) and mental processes (evaluating accuracy).
The “loss” is a mathematical formula comparing outputs; this is a mathematical concept.
“Modifying one or more learnable parameters of the second ASR model based at least on the first loss.”
Abstract idea grouping: Mathematical relationships (parameter updates via optimization algorithms).
Updating parameters is an abstract mathematical operation (e.g., gradient descent).
Step 2A, Prong One (Judicial Exception):All limitations recite operations that fall into the categories of mental processes (conceptual steps that can be performed in the human mind, such as evaluating confidence, deciding whether to proceed, comparing outputs) and mathematical relationships/algorithms (vector representations, adding noise, calculating loss, updating parameters). Therefore, claim 1 is directed to a judicial exception.
Step 2A, Prong Two (Integration into Practical Application):The claim does not integrate the abstract idea into a practical application. It uses generic ASR models, generic vector manipulation, and generic computing resources without reciting a specific improvement to computer functionality or a particular machine. The steps are performed in a conventional computing environment and do not tie the abstract idea to a meaningful application beyond the processing of data itself.
Step 2B (Inventive Concept):The claim elements, individually and in combination, are well-understood, routine, and conventional in the field of machine learning and speech recognition. The recitation of “unsupervised diffusion generative adversarial network” or “diffusion model” is a generic identification of known model types, not a specific, unconventional implementation. No inventive concept is present.
Independent Claim 10:
“Obtaining a textual representation of first speech data based at least on applying a first deployed ASR model to the first speech data…”
Abstract idea grouping: Mental processes (transcribing speech) and mathematical relationships (mapping speech signal to text via algorithms).
“…wherein the first deployed ASR model is trained, at least in part, by: generating, using a first ASR model comprising an unsupervised diffusion generative adversarial network model and based at least on second speech data, a first text output;”
Same as Claim 1’s first step — mathematical and mental processes.
“Generating, using a second ASR model and based at least on the second speech data and the first text output, a second text output, wherein the second ASR model adds noise to the first text output to obtain a noisy representation of the first text output;”
Mathematical relationships (noise addition, vector manipulation).
“Calculating a first loss…based at least on a comparison between the second text output and the first text output…”
Mathematical relationships (loss computation).
“Modifying one or more learnable parameters…”
Mathematical relationships (optimization algorithms).
Conclusion: Claim 10 repeats the mathematical/mental process steps of Claim 1, framed as a training process for a deployed model. No integration into a practical application or inventive concept is present.
Independent Claim 19:
“A system comprising: one or more processing units to: apply a first deployed ASR model to first speech data to obtain a text representation…”
Mental process (transcription) and mathematical relationships (signal-to-text mapping).
“…wherein the first deployed ASR model is trained, at least in part, by: generating…using a first ASR model comprising an unsupervised diffusion generative adversarial network model…a first text output…”
Mathematical relationships (vector processing).
“Generating, using a second ASR model…a second text output…adding noise to the first text output to obtain a noisy representation…”
Mathematical relationships (vector manipulation).
“Calculating a first loss…comparing the second text output and the first text output…”
Mathematical relationships (loss computation).
“Modifying one or more learnable parameters…”
Mathematical relationships (optimization algorithms).
Conclusion: Claim 19 is the same abstract process as Claim 1, implemented on generic “processing units.” The hardware recited is generic and performs routine functions.
Dependent Claims:
Claim 2: Adds training with second speech data and target output label — still mathematical relationships (loss calculation, parameter updates).
Claim 3: Modifies first ASR model based on third text output — mathematical optimization.
Claim 4: Multilingual data — data content variation, no inventive concept.
Claim 5: Specifies model types — naming known architectures, no unconventional detail.
Claim 6: Concatenating vectors, adding/removing noise — mathematical operations.
Claim 7: Specifies token types (word/phoneme/IPA) — data format, conventional.
Claim 8: Loss based on variational lower bound — known mathematical metric.
Claim 9: Clustering algorithm based on speaker attributes — known ML clustering.
Claim 11–13: Same as 2–4 but for Claim 10 — no inventive concept.
Claim 14: Specifies model types — conventional.
Claim 15–16: Same as 6–7 — mathematical operations.
Claim 17: Variational lower bound — conventional metric.
Claim 18: Clustering algorithm — conventional ML technique.
Claim 20: Lists application environments — field of use limitations, no transformation into practical application.
Therefore, claims 1-20 are directed to abstract ideas under the mental process and mathematical relationships/algorithms groupings. No claim integrates the abstract idea into a practical application or recites an inventive concept. The recited steps are routine, conventional data processing and machine learning operations performed on generic computing hardware.

Allowable Subject Matter
 	Claims 1-20 would be in condition of allowance once the above rejection has been satisfactorily overcome.

Conclusion


The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 11908454 B2, Thomas et al., Integrating Text Inputs for Training and adapting Neural Network Transducer ASR Models.


THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to PIERRE LOUIS DESIR whose telephone number is (571)272-7799. The examiner can normally be reached Monday-Friday 9AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659

Read full office action

Prosecution Timeline

Jul 10, 2023

Application Filed

May 29, 2025

Non-Final Rejection mailed — §101

Aug 29, 2025

Response Filed

Dec 03, 2025

Final Rejection mailed — §101

Mar 03, 2026

Response after Non-Final Action

Apr 03, 2026

Request for Continued Examination

Apr 05, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

18/215,972

Patent 12632788

PROMPT AUGMENTED GENERATIVE REPLAY VIA SUPERVISED CONTRASTIVE TRAINING FOR LIFELONG INTENT DETECTION

2y 10m to grant Granted May 19, 2026

18/446,635

Patent 12609124

VOICE AGENT SYSTEM

2y 8m to grant Granted Apr 21, 2026

18/298,060

Patent 12585679

EXECUTING UNSUPERVISED PRE-TRAINING TASKS WITH A MACHINE LEARNING MODEL TO PREDICT DOCUMENT GRAPH ATTRIBUTES

2y 11m to grant Granted Mar 24, 2026

18/184,630

Patent 12562154

Scalable Model Specialization Framework for Speech Model Personalization

2y 11m to grant Granted Feb 24, 2026

18/055,870

Patent 12555594

SYSTEM AND METHOD FOR TRACKING EMOTIONAL STATE OF A CALLER USING ARTIFICIAL INTELLIGENCE

3y 3m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3

Expected OA Rounds

61%

Grant Probability

93%

With Interview (+32.1%)

3y 11m (~1y 0m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 288 resolved cases by this examiner. Grant probability derived from career allowance rate.