Prosecution Insights
Last updated: April 19, 2026
Application No. 18/798,633

CONTEXTUAL SPEECH RECOGNITION OF VIRTUAL MEETINGS

Non-Final OA §102§103
Filed
Aug 08, 2024
Examiner
TRAN, QUOC DUC
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
1 (Non-Final)
86%
Grant Probability
Favorable
1-2
OA Rounds
2y 7m
To Grant
90%
With Interview

Examiner Intelligence

Grants 86% — above average
86%
Career Allow Rate
720 granted / 841 resolved
+23.6% vs TC avg
Minimal +5% lift
Without
With
+4.8%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
17 currently pending
Career history
858
Total Applications
across all art units

Statute-Specific Performance

§101
5.0%
-35.0% vs TC avg
§103
43.3%
+3.3% vs TC avg
§102
30.5%
-9.5% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 841 resolved cases

Office Action

§102 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . DETAILED ACTION Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention. (a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention. Claims 1-2, 4-11 and 13-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Martin et al (2025/0259627). Consider claims 1, 10 and 19, Martin et al teach method, system and non-transitory computer-readable storage medium comprising instructions that, when executed by a processing device, cause the processing device to perform operations comprising: receiving audio data of a virtual meeting (par. 0029; 0041; “the automatic speech recognition module 520 can send and/or receive audio 527 and/or transcription 525 from a meeting 526”); identifying, within a plurality of content items related to the virtual meeting, content not previously recognized by a speech recognition system designated to convert the audio data of the virtual meeting into text (par. 0038; “identifying, by the device, a subset of the unrecognized words for boosting and applying, by the device, a context-specific boosting to the subset of the unrecognized words within an automated speech recognition model when criteria, identified based on contextual data associated with the subset of the unrecognized words, are met within the audio from the communication session”; par. 0059; 0062; “a device detects unrecognized words within an automated transcript of audio from a communication session. In some implementations, detecting the unrecognized words may include identifying a portion of words in the automated transcript that do not exist within a precompiled large general lexicon as the unrecognized words”); causing the speech recognition system to be modified based on the previously unrecognized content (par. 0063-0064; “the device applies a context-specific boosting to the subset of the unrecognized words within an automated speech recognition model when criteria, identified based on contextual data associated with the subset of the unrecognized words, are met within the audio from the communication session”); and causing the audio data of the virtual meeting to be converted into the text using the modified speech recognition system, wherein the text comprises at least part of the previously unrecognized content (par. 0063-0064; “Applying the context-specific boosting to the subset of the unrecognized words can include causing the subset of the unrecognized words to appear more accurately in the automated transcript of the communication session based on communication session identification characteristics meeting the criteria”). Consider claims 2, 11 and 20, Martin et al teach wherein the plurality of content items comprises at least one of: names of participants of the virtual meeting; documents related to the virtual meeting; text shared between participants of the virtual meeting; or documents associated with an organization of a participant of the virtual meeting (par. 0003; 0035; 0037; 0070; “the systems of the present disclosure may be able to generate more accurate text from speech, particularly with respect to company names, company jargon, industry-specific terms, slang terms, and/or technical terms, as well as handle accents of various speakers with various dialects and/or enunciations”). Consider claims 4 and 13, Martin et a teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises: generating one or more possible pronunciations for the previously unrecognized content; and adding the one or more possible pronunciations to a lexicon of the speech recognition system (par. 0003; 0035; “the recognition of personalized content, such as company names, company jargon, industry-specific terms, slang terms, etc., continues to be problematic for automatic speech recognition (ASR) systems, especially for end-to-end models. Incorporating contextual information into ASR systems may help to better recognize and transcribe spoken words, especially in cases where ambiguity or variability exist in the pronunciation of certain words or phrases. This can improve the overall accuracy and usability of speech recognition technology in various applications, such as virtual assistants, speech-to-text transcription, and automated call centers, among other applications”). Consider claims 5 and 14, Martin et al teach wherein the speech recognition system comprises one or more machine learning models trained to convert speech data to corresponding text data (par. 0055; “For training an End-To-End Speech recognition model, a technique called sub-word tokenization can be employed to tokenize any words. Sub-word tokenization can be used in natural language processing (NLP) to break down words into smaller units called sub-word tokens”). Consider claims 6 and 15, Martin et al teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises: generating one or more possible pronunciations for the previously unrecognized content; generating training data comprising the one or more possible pronunciations as inputs and the previously unrecognized content as target output; and retraining a first machine learning model of the one or more machine learning models using the generated training data (par. 0055; “For training an End-To-End Speech recognition model”; par. 0037; “system that includes a lexicon-free ASR module, a general lexicon module, a proposer module and a context enhancer module that are capable of learning subset of words specific to each context of use to improve the overall accuracy and usability of speech recognition technology across various applications. As described in further detail, herein, such a system can be regularly updated according to changes in the linguistic patterns of the users. As a result, the systems of the present disclosure may be able to generate more accurate text from speech, particularly with respect to company names, company jargon, industry-specific terms, slang terms, and/or technical terms, as well as handle accents of various speakers with various dialects and/or enunciations”). Consider claims 7 and 16, Martin et al teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises: generating one or more possible pronunciations for the previously unrecognized content; generating training data comprising the one or more possible pronunciations as inputs and the previously unrecognized content as target output; training a new machine learning model using the generated training data to recognize the previously unrecognized content; and adding the new machine learning model to the one or more machine learning models of the speech recognition system (par. 0037; “system that includes a lexicon-free ASR module, a general lexicon module, a proposer module and a context enhancer module that are capable of learning subset of words specific to each context of use to improve the overall accuracy and usability of speech recognition technology across various applications. As described in further detail, herein, such a system can be regularly updated according to changes in the linguistic patterns of the users. As a result, the systems of the present disclosure may be able to generate more accurate text from speech, particularly with respect to company names, company jargon, industry-specific terms, slang terms, and/or technical terms, as well as handle accents of various speakers with various dialects and/or enunciations”; par. 0046; “the production ASR module 524 can generate live transcripts, for example of a meeting or other voice activity. In addition, a secondary ASR module, e.g., the lexicon-free ASR 540 is also provided in the system 500”; par. 0055; “For training an End-To-End Speech recognition model”;). Consider claims 8 and 17, Martin et al teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises providing a representation of the previously unrecognized content to at least a first machine learning model of the one or more machine learning models (par. 0035-0038; “the recognition of personalized content, such as company names, company jargon, industry-specific terms, slang terms, etc., continues to be problematic for automatic speech recognition (ASR) systems, especially for end-to-end models”; “automatic personalization for speech recognition systems is provided by a method that includes detecting, by a device, unrecognized words within an automated transcript of audio from a communication session and associating, by the device, the unrecognized words with corresponding contextual data. The method further includes identifying, by the device, a subset of the unrecognized words for boosting and applying, by the device, a context-specific boosting to the subset of the unrecognized words within an automated speech recognition model when criteria, identified based on contextual data associated with the subset of the unrecognized words, are met within the audio from the communication session”). Consider claims 9 and 18, Martin et al teach wherein the text from causing the audio data of the virtual meeting to be converted using the modified speech recognition system is at least one of: live captions visible during the virtual meeting; a transcription of the virtual meeting; or a summary of the virtual meeting generated using one or more machine learning models (par. 0045; “The automatic speech recognition module 520 can be responsible for the automatic transcription of live meetings or other scenarios in which voice activity is being recorded or monitored”). Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Martin et al (2025/0259627) in view of Liang et al (12,518,748). Consider claims 3 and 12, Martin et al suggested of sharing presentation. Martin et al did not explicitly suggest wherein an image is shared during the virtual meeting, and wherein the plurality of content items comprises text derived from processing the image using optical character recognition. In the same field of endeavor, Liang et al suggest of such (col. 4 lines 54-67; “system performs content extraction (e.g., optical character recognition (OCR)) to the captured screens (e.g., meeting content) for ASR (automated speech recognition) improvement. For example, the media service system uses OCR to identify one or more terms and uses the one or more identified terms to improve speech recognition and meeting transcriptions”). Therefore, it would have been obvious to one of the ordinary skills in the art before the effective filing date to incorporate the teaching of Liang et al into view of Martin et al to improve speech recognition and meeting transcriptions. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Any response to this action should be mailed to: Mail Stop ____(explanation, e.g., Amendment or After-final, etc.) Commissioner for Patents P.O. Box 1450 Alexandria, VA 22313-1450 Facsimile responses should be faxed to: (571) 273-8300 Hand-delivered responses should be brought to: Customer Service Window Randolph Building 401 Dulany Street Alexandria, VA 22314 Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUOC DUC TRAN whose telephone number is (571) 272-7511. The examiner can normally be reached Monday-Friday 8:30am - 5pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Quoc D Tran/ Primary Examiner, Art Unit 2691 March 18, 2026
Read full office action

Prosecution Timeline

Aug 08, 2024
Application Filed
Mar 18, 2026
Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12598268
STAGE USER REPLACEMENT TECHNIQUES FOR ONLINE VIDEO CONFERENCES
2y 5m to grant Granted Apr 07, 2026
Patent 12598251
PREVENTING DEEP FAKE VOICEMAIL SCAMS
2y 5m to grant Granted Apr 07, 2026
Patent 12592989
DETECTING A SPOOFED CALL
2y 5m to grant Granted Mar 31, 2026
Patent 12593011
APPARATUS AND METHODS FOR VISUAL SUMMARIZATION OF VIDEOS
2y 5m to grant Granted Mar 31, 2026
Patent 12581033
ENFORCING A LIVENESS REQUIREMENT ON AN ENCRYPTED VIDEOCONFERENCE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
86%
Grant Probability
90%
With Interview (+4.8%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 841 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month