Last updated: April 19, 2026

Application No. 18/798,633

CONTEXTUAL SPEECH RECOGNITION OF VIRTUAL MEETINGS

Non-Final OA §102§103

Filed

Aug 08, 2024

Examiner

TRAN, QUOC DUC

Art Unit

2691

Tech Center

2600 — Communications

Assignee

Google LLC

OA Round

1 (Non-Final)

Interview Optional

— +4.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 841 resolved cases, 2023–2026

Examiner Intelligence

TRAN, QUOC DUC View full profile →

Grants 86% — above average

Career Allow Rate

720 granted / 841 resolved

+23.6% vs TC avg

Minimal +5% lift

Without

With

+4.8%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

17 currently pending

Career history

858

Total Applications

across all art units

Statute-Specific Performance

§101

5.0%

-35.0% vs TC avg

§103

43.3%

+3.3% vs TC avg

§102

30.5%

-9.5% vs TC avg

§112

5.3%

-34.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 841 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 4-11 and 13-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Martin et al (2025/0259627).
Consider claims 1, 10 and 19, Martin et al teach method, system and non-transitory computer-readable storage medium comprising instructions that, when executed by a processing device, cause the processing device to perform operations comprising: receiving audio data of a virtual meeting (par. 0029; 0041; “the automatic speech recognition module 520 can send and/or receive audio 527 and/or transcription 525 from a meeting 526”); identifying, within a plurality of content items related to the virtual meeting, content not previously recognized by a speech recognition system designated to convert the audio data of the virtual meeting into text (par. 0038; “identifying, by the device, a subset of the unrecognized words for boosting and applying, by the device, a context-specific boosting to the subset of the unrecognized words within an automated speech recognition model when criteria, identified based on contextual data associated with the subset of the unrecognized words, are met within the audio from the communication session”; par. 0059; 0062; “a device detects unrecognized words within an automated transcript of audio from a communication session. In some implementations, detecting the unrecognized words may include identifying a portion of words in the automated transcript that do not exist within a precompiled large general lexicon as the unrecognized words”); causing the speech recognition system to be modified based on the previously unrecognized content (par. 0063-0064; “the device applies a context-specific boosting to the subset of the unrecognized words within an automated speech recognition model when criteria, identified based on contextual data associated with the subset of the unrecognized words, are met within the audio from the communication session”); and causing the audio data of the virtual meeting to be converted into the text using the modified speech recognition system, wherein the text comprises at least part of the previously unrecognized content (par. 0063-0064; “Applying the context-specific boosting to the subset of the unrecognized words can include causing the subset of the unrecognized words to appear more accurately in the automated transcript of the communication session based on communication session identification characteristics meeting the criteria”).
Consider claims 2, 11 and 20, Martin et al teach wherein the plurality of content items comprises at least one of: names of participants of the virtual meeting; documents related to the virtual meeting; text shared between participants of the virtual meeting; or documents associated with an organization of a participant of the virtual meeting (par. 0003; 0035; 0037; 0070; “the systems of the present disclosure may be able to generate more accurate text from speech, particularly with respect to company names, company jargon, industry-specific terms, slang terms, and/or technical terms, as well as handle accents of various speakers with various dialects and/or enunciations”).
Consider claims 4 and 13, Martin et a teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises: generating one or more possible pronunciations for the previously unrecognized content; and adding the one or more possible pronunciations to a lexicon of the speech recognition system (par. 0003; 0035; “the recognition of personalized content, such as company names, company jargon, industry-specific terms, slang terms, etc., continues to be problematic for automatic speech recognition (ASR) systems, especially for end-to-end models. Incorporating contextual information into ASR systems may help to better recognize and transcribe spoken words, especially in cases where ambiguity or variability exist in the pronunciation of certain words or phrases. This can improve the overall accuracy and usability of speech recognition technology in various applications, such as virtual assistants, speech-to-text transcription, and automated call centers, among other applications”).
Consider claims 5 and 14, Martin et al teach wherein the speech recognition system comprises one or more machine learning models trained to convert speech data to corresponding text data (par. 0055; “For training an End-To-End Speech recognition model, a technique called sub-word tokenization can be employed to tokenize any words. Sub-word tokenization can be used in natural language processing (NLP) to break down words into smaller units called sub-word tokens”).
Consider claims 6 and 15, Martin et al teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises: generating one or more possible pronunciations for the previously unrecognized content; generating training data comprising the one or more possible pronunciations as inputs and the previously unrecognized content as target output; and retraining a first machine learning model of the one or more machine learning models using the generated training data (par. 0055; “For training an End-To-End Speech recognition model”; par. 0037; “system that includes a lexicon-free ASR module, a general lexicon module, a proposer module and a context enhancer module that are capable of learning subset of words specific to each context of use to improve the overall accuracy and usability of speech recognition technology across various applications. As described in further detail, herein, such a system can be regularly updated according to changes in the linguistic patterns of the users. As a result, the systems of the present disclosure may be able to generate more accurate text from speech, particularly with respect to company names, company jargon, industry-specific terms, slang terms, and/or technical terms, as well as handle accents of various speakers with various dialects and/or enunciations”).
Consider claims 7 and 16, Martin et al teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises: generating one or more possible pronunciations for the previously unrecognized content; generating training data comprising the one or more possible pronunciations as inputs and the previously unrecognized content as target output; training a new machine learning model using the generated training data to recognize the previously unrecognized content; and adding the new machine learning model to the one or more machine learning models of the speech recognition system (par. 0037; “system that includes a lexicon-free ASR module, a general lexicon module, a proposer module and a context enhancer module that are capable of learning subset of words specific to each context of use to improve the overall accuracy and usability of speech recognition technology across various applications. As described in further detail, herein, such a system can be regularly updated according to changes in the linguistic patterns of the users. As a result, the systems of the present disclosure may be able to generate more accurate text from speech, particularly with respect to company names, company jargon, industry-specific terms, slang terms, and/or technical terms, as well as handle accents of various speakers with various dialects and/or enunciations”; par. 0046; “the production ASR module 524 can generate live transcripts, for example of a meeting or other voice activity. In addition, a secondary ASR module, e.g., the lexicon-free ASR 540 is also provided in the system 500”; par. 0055; “For training an End-To-End Speech recognition model”;).
	Consider claims 8 and 17, Martin et al teach wherein causing the speech recognition system to be modified based on the previously unrecognized content comprises providing a representation of the previously unrecognized content to at least a first machine learning model of the one or more machine learning models (par. 0035-0038; “the recognition of personalized content, such as company names, company jargon, industry-specific terms, slang terms, etc., continues to be problematic for automatic speech recognition (ASR) systems, especially for end-to-end models”; “automatic personalization for speech recognition systems is provided by a method that includes detecting, by a device, unrecognized words within an automated transcript of audio from a communication session and associating, by the device, the unrecognized words with corresponding contextual data. The method further includes identifying, by the device, a subset of the unrecognized words for boosting and applying, by the device, a context-specific boosting to the subset of the unrecognized words within an automated speech recognition model when criteria, identified based on contextual data associated with the subset of the unrecognized words, are met within the audio from the communication session”).
Consider claims 9 and 18, Martin et al teach wherein the text from causing the audio data of the virtual meeting to be converted using the modified speech recognition system is at least one of: live captions visible during the virtual meeting; a transcription of the virtual meeting; or a summary of the virtual meeting generated using one or more machine learning models (par. 0045; “The automatic speech recognition module 520 can be responsible for the automatic transcription of live meetings or other scenarios in which voice activity is being recorded or monitored”).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 3 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Martin et al (2025/0259627) in view of Liang et al (12,518,748).
Consider claims 3 and 12, Martin et al suggested of sharing presentation. Martin et al did not explicitly suggest wherein an image is shared during the virtual meeting, and wherein the plurality of content items comprises text derived from processing the image using optical character recognition. In the same field of endeavor, Liang et al suggest of such (col. 4 lines 54-67; “system performs content extraction (e.g., optical character recognition (OCR)) to the captured screens (e.g., meeting content) for ASR (automated speech recognition) improvement. For example, the media service system uses OCR to identify one or more terms and uses the one or more identified terms to improve speech recognition and meeting transcriptions”). Therefore, it would have been obvious to one of the ordinary skills in the art before the effective filing date to incorporate the teaching of Liang et al into view of Martin et al to improve speech recognition and meeting transcriptions.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Any response to this action should be mailed to:
Mail Stop ____(explanation, e.g., Amendment or After-final, etc.)        	Commissioner for Patents
        			P.O. Box 1450
        			Alexandria, VA 22313-1450
		Facsimile responses should be faxed to:

(571) 273-8300

Hand-delivered responses should be brought to: 
Customer Service Window
Randolph Building
401 Dulany Street
Alexandria, VA 22314

Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUOC DUC TRAN whose telephone number is (571) 272-7511. The examiner can normally be reached Monday-Friday 8:30am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Quoc D Tran/
Primary Examiner, Art Unit 2691
March 18, 2026

Read full office action

Prosecution Timeline

Aug 08, 2024

Application Filed

Mar 18, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/354,297

Patent 12598268

STAGE USER REPLACEMENT TECHNIQUES FOR ONLINE VIDEO CONFERENCES

2y 5m to grant Granted Apr 07, 2026

18/458,276

Patent 12598251

PREVENTING DEEP FAKE VOICEMAIL SCAMS

2y 5m to grant Granted Apr 07, 2026

18/298,443

Patent 12592989

DETECTING A SPOOFED CALL

2y 5m to grant Granted Mar 31, 2026

18/634,280

Patent 12593011

APPARATUS AND METHODS FOR VISUAL SUMMARIZATION OF VIDEOS

2y 5m to grant Granted Mar 31, 2026

18/360,188

Patent 12581033

ENFORCING A LIVENESS REQUIREMENT ON AN ENCRYPTED VIDEOCONFERENCE

2y 5m to grant Granted Mar 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

86%

Grant Probability

90%

With Interview (+4.8%)

2y 7m

Median Time to Grant

Low

PTA Risk

Based on 841 resolved cases by this examiner. Grant probability derived from career allow rate.