Prosecution Insights
Last updated: April 19, 2026
Application No. 18/228,349

AUTOMATIC SPEECH RECOGNITION FOR INTERACTIVE VOICE RESPONSE SYSTEMS

Non-Final OA §103
Filed
Jul 31, 2023
Examiner
ROBERTS, SHAUN A
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
3 (Non-Final)
76%
Grant Probability
Favorable
3-4
OA Rounds
2y 10m
To Grant
86%
With Interview

Examiner Intelligence

Grants 76% — above average
76%
Career Allow Rate
491 granted / 647 resolved
+13.9% vs TC avg
Moderate +10% lift
Without
With
+10.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
31 currently pending
Career history
678
Total Applications
across all art units

Statute-Specific Performance

§101
7.6%
-32.4% vs TC avg
§103
49.2%
+9.2% vs TC avg
§102
29.5%
-10.5% vs TC avg
§112
3.5%
-36.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 647 resolved cases

Office Action

§103
DETAILED ACTION Continued Examination Under 37 CFR 1.114 1. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 1/8/2026 has been entered. Notice of Pre-AIA or AIA Status 2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment 3. Claims 4, 11, and 18 have been cancelled. Response to Arguments 4. Applicant’s arguments filed have been fully considered but are moot based on the new grounds of rejection. Claim Rejections - 35 USC § 103 5. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 6. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 7. Claims 1, 5, 7-8, 12, 14-15, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al (FawAI ASR System for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge) in view of Hofer et al (2016/0093292) in further view of Nagao (2015/0179177). Regarding claim 1 Sun teaches A method comprising: receiving an audio input from a user (abstract; 2.1 contents of the speech…user’s command; the commands involve navigation to point of interest, making a phone call, controlling the air conditioner and playing music; contact names); determining, using a first trained model, a plurality of candidate commands (3.4 during the decoding n-best candidates are achieved from CTC WFST beam search decoder), comprising; wherein the first trained model comprises a weighted finite state transducer ("WFST") (abstract; 3.4 WFST) determining, using a second trained model, a recognized command from the plurality of candidate commands (3.4 the n-best candidates are rescored by the Attention Rescoring module to find the best candidate); and identifying a corresponding valid command in a set of valid commands based on the recognized command (2.1 contents of the speech…user’s command; the commands involve navigation to point of interest, making a phone call, controlling the air conditioner and playing music; contact names Abstract: automatic speech recognition systems; connectionist temporal classification/attention-based encoder-decoder architecture…based on a weighted finite state transducer; 4 Experiments: performance). Sun does not specifically teach where Hofer et al (2016/0093292) teaches executing each available path within the first trained model [substantially in parallel ] (abstract: method in a computing device for decoding a weighted finite state transducer for automatic speech recognition; 0001-0002: weighted score to each transition/arc; 4: state transition; 6; 0029; 0035 [0004] An HMM is a FSM with state transition probabilities and emissive (or observation) probabilities. A state transition probability of one state to another state represents the probability of transition from the one state to the other. An emissive probability for an observation is the probability that a state will “emit,” or generate, a particular observation. These probability values may be discovered for a particular system by a training process that uses training data. This training data includes observations along with the known states that generated these observations. After training, a decoding process, using a set of new observations, may traverse through the HMM to discover the most likely set of states that generated these observations. For example, after an HMM modeling the acoustic features to phones of a language has been trained, a decoding process may be used on a new set of audio (i.e., spoken words/sounds) to discover the most likely states and transitions that generated these observations. These states and transitions are associated with various phones in the language. If using a WFST, the input labels of this HMM WFST would be the acoustic features (the observations) the output labels would be the phones, and the weights of each transition would be the state transition probabilities. ). obtaining scores from the available paths (0001-0002: weighted score to each transition/arc; 4: state transition probabilities; 6); comparing the scores to a predetermined threshold (0010: threshold; 0029; 39); and determining the plurality of candidate commands based on the scores satisfying the predetermined threshold (abstract: exceeds a score threshold; 0035: After processing an utterance or section of speech, the token that remains and has the best score is the token that represents the path through the WFST with the most likely hypothesis of what was spoken.). It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Hofer to allow for the performance of the WFST (of Sun) for speech recognition. Sun already teaches the use of the WFST, and one would look to Hofer to implement it in the particular fashion presenting a reasonable expectation of success of completing the recognition with the WFST, and for the benefit where, computational complexity is further reduced; reduces decoding time and power consumption (Hofer 0030). Hofer does not explicitly teach where Nagao teaches executing each available path within the first trained model substantially in parallel (0041: In the process of searching a WFST, since a plurality of paths is searched in parallel, a plurality of tokens is managed at the same time. Moreover, a token holds the accumulation score of the path. Furthermore, a token holds a string of output symbols assigned in the paths which have been passed.). It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate Nagao for an improved system allowing for quicker response times of the WFST, and ultimately the recognition overall. Regarding claim 5 Sun teaches The method of claim 1, wherein the second trained model comprises an attention- based decoder (Abstract: attention based encoder decoder architecture; 3.4: Decoding; Attention rescoring module). Regarding claim 7 Sun teaches The method of claim 1, further comprising executing the corresponding valid command (abstract; 2.1 contents of the speech…user’s command; the commands involve navigation to point of interest, making a phone call, controlling the air conditioner and playing music; contact names). Regarding claim 8 Sun, Hofer, and Nagao teach A system comprising: a non-transitory computer-readable medium; and one or more processors communicatively coupled to the non-transitory computer- readable medium, the one or more processors configured to execute instructions stored in the non-transitory computer-readable medium to: receive an audio input; execute, by a first trained model, each available path within the first trained model substantially in parallel, wherein the first trained model comprises a weighted finite state transducer ("WFST"); obtain scores from the available paths; compare the scores to a predetermined threshold; determine, a plurality of candidate commands based on the scores satisfying the predetermined threshold; determine, using a second trained model, a recognized command from the plurality of candidate commands; and identify a corresponding valid command in a set of valid commands based on the recognized command Rejected for similar rationale and reasoning as claim 1 While the teachings of Sun are inherently performed using computer-based processing, and where Sun further teaches in 3.3: train the model on a GPU server; To advance prosecution, Hofer more explicitly teaches A system comprising: a non-transitory computer-readable medium; and one or more processors communicatively coupled to the non-transitory computer- readable medium, the one or more processors configured to execute instructions stored in the non-transitory computer-readable medium (fig 5-6; 0075; 0082-0083). It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the system presenting a reasonable expectation of success of allowing the models of Sun to be utilized to allow for the execution of the speech recognition. Claim 12 recites limitations similar to claim 5 and is rejected for similar rationale and reasoning Claim 14 recites limitations similar to claim 7 and is rejected for similar rationale and reasoning Regarding claim 15 Sun, Hofer, and Nagao teach A non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors to: receive an audio input; execute, by a first trained model, each available path within the first trained model substantially in parallel, wherein the first trained model comprises a weighted finite state transducer ("WFST"); obtain scores from the available paths; compare the scores to a predetermined threshold; determine, a plurality of candidate commands based on the scores satisfying the predetermined threshold; determine, using a second trained model, a recognized command from the plurality of candidate commands; and identify a corresponding valid command in a set of valid commands based on the recognized command. Rejected for similar rationale and reasoning as claim 8 Claim 19 recites limitations similar to claim 5 and is rejected for similar rationale and reasoning 8. Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Sun in view of Hofer et al (2016/0093292) in further view of Nagao in further view of Mukherjee et al (2021/0241354). Regarding claim 6 Sun does not specifically teach where Mukherjee teaches The method of claim 1, wherein identifying the corresponding valid command comprises performing fuzzy matching using the recognized command and the set of valid commands (0050 the fuzzy matching algorithm can include matching the recipe descriptor (e.g., recipe name) from the voice command against recipe names (e.g., recipe titles) in a database.). It would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate fuzzy matching to obtain the most similar (valid) command for improved speech recognition and several technological improvements (Mukherjee 0097). Claim 13 recites limitations similar to claim 6 and is rejected for similar rationale and reasoning Claim 20 recites limitations similar to claim 6 and is rejected for similar rationale and reasoning Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAUN A ROBERTS whose telephone number is (571)270-7541. The examiner can normally be reached Monday-Friday 9-5 EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached on 571-272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SHAUN ROBERTS/Primary Examiner, Art Unit 2655
Read full office action

Prosecution Timeline

Jul 31, 2023
Application Filed
May 23, 2025
Non-Final Rejection — §103
Aug 28, 2025
Response Filed
Sep 05, 2025
Final Rejection — §103
Jan 08, 2026
Request for Continued Examination
Jan 23, 2026
Response after Non-Final Action
Feb 03, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12586599
AUDIO SIGNAL PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM WITH MACHINE LEARNING AND FOR MICROPHONE MUTE STATE FEATURES IN A MULTI PERSON VOICE CALL
2y 5m to grant Granted Mar 24, 2026
Patent 12586568
SYNTHETICALLY GENERATING INNER SPEECH TRAINING DATA
2y 5m to grant Granted Mar 24, 2026
Patent 12573376
Dynamic Language and Command Recognition
2y 5m to grant Granted Mar 10, 2026
Patent 12562157
GENERATING TOPIC-SPECIFIC LANGUAGE MODELS
2y 5m to grant Granted Feb 24, 2026
Patent 12555562
VOICE SYNTHESIS FROM DIFFUSION GENERATED SPECTROGRAMS FOR ACCESSIBILITY
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
86%
With Interview (+10.3%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 647 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month