Last updated: April 19, 2026
Application No. 17/849,306
VOICE TRANSCRIPTION FEEDBACK FOR VIRTUAL MEETINGS

Non-Final OA §103
Filed
Jun 24, 2022
Examiner
SHAIKH, ZEESHAN MAHMOOD
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
5 (Non-Final)
This examiner grants 52% of cases after interview

— +55.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 31 resolved cases, 2023–2026
Examiner Intelligence

SHAIKH, ZEESHAN MAHMOOD View full profile →
Grants 52% of resolved cases
Career Allow Rate
16 granted / 31 resolved
-10.4% vs TC avg
Strong +55% interview lift
Without
With
+55.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
32 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
25.7%
-14.3% vs TC avg
§103
45.8%
+5.8% vs TC avg
§102
17.3%
-22.7% vs TC avg
§112
5.8%
-34.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 31 resolved cases
Office Action

§103
DETAILED ACTION

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/17/2025 has been entered.
 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This communication is responsive to the amendment dated 12/17/2025.  The applicant amended claims 1, 7, and 15.  

Response to Arguments
The applicant argues with respect to 35 U.S.C. 103, see Remarks (pg. 7, line 20 – pg. 8, line 30), McQuiston in view of Ackerman does not disclose or make obvious, “transmit, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams”.  Applicant’s arguments with respect to claim(s) 1, 7, and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The respective dependent claims 2-4, 6, 8-14, and 16-20 also remain rejected.  

 Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4, 6-7, 15-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over McQuiston et al. US 11514914 B2 (hereinafter McQuiston) in view of Poddar US 20220319505 A1.

Regarding claim 1, McQuiston discloses a system comprising: a non-transitory computer-readable medium; a communications interface; and a processor communicatively coupled to the non-transitory computer-readable medium and the communications interface, the processor configured to execute processor- executable instructions stored in the non-transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”): 
establish a virtual meeting having a plurality of participants, each participant of the plurality of participants exchanging one or more audio streams via the virtual meeting ([Column 2, line 10-23] “the virtual assistant receiving at least an audio feed and a video feed of an electronic meeting in real-time”, examiner interprets the virtual assistant as the computer system that interfaces with the user’s devices during the meeting; [Column 3, line 17-18] FIG. 1 “system 100 may include one or more communication devices 110 associated with one or more users 105”); 
generate a transcript of at least a subset of the one or more audio streams exchanged during the virtual meeting using a speech recognition system ([Column 2, line 10-23] “the virtual assistant transcribing the audio feed using a speech-recognition algorithm”); 
determine a segment of the transcript for correction ([0057] “attendees receiving the live transcription may make corrections in the transcript. The virtual assistant may apply that correction in real-time. The virtual assistant may update the transcription algorithms based on the corrections.”);
receive, from the first client device, a correction to one or more transcribed words within the segment of the transcript ([Column 2, line 10-23] “the virtual assistant receiving an edited transcription”); and 
update the speech recognition system based on the correction to the one or more transcribed words within the transcript ([Column 2, line 10-23] “the virtual assistant updating the speech recognition algorithm based on the edited transcription”).
McQuiston fails to teach transmit, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams;
However, Poddar teaches transmit, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams (FIG. 1A, Block 15-16, [0025] “In block 15, a feedback report with the categorized segment containing the corrected transcript snippet and call recording snippet is sent to a human teacher. In block 16, the corrected transcript snippet, incorrect transcript snippet, and the recording snippet are transmitted to a custom transcription model for retraining using the new data”);
McQuiston in view of Poddar are considered to be analogous to the claimed invention because both are the same field of conversation analysis.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of a virtual assistant to transcribe, record, and analyze discussions during a meeting of McQuiston with the technique of transmitting a portion of a transcript along with the corresponding audio portion taught by Poddar in order to improve improving a virtual speech agent's natural language understanding. (see Poddar [0002]).

Regarding Claim 4, McQuiston in view of Poddar teaches all of the limitations of claim 1, upon which claim 4 depends.  
Additionally, McQuiston teaches wherein the processor is configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”): 
determine the segment of the transcript containing transcribed words corresponding to an audio stream associated with the first client device ([Column 5, line 38-41] “In step 215, the virtual assistant may transcribe the audio of the meeting. The transcript may occur simultaneously with the meeting or may take place after the meeting is done, based on the audio recording”); 
Additionally, Poddar teaches determine the snippet of the first audio stream corresponding to the segment of the transcript (FIG. 3, [0022] “We then generate a feedback report that contains just the segments (e.g., snippets of the recording file and transcript rather than full conversation) of the conversation where we noticed improvement opportunities”).

Regarding Claim 6, McQuiston in view of Poddar teaches all of the limitations of claim 1, upon which claim 6 depends.  
Additionally, McQuiston teaches wherein the processor is configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”): 
update the transcript of at least the subset of the one or more audio streams using the correction to the one or more transcribed words ([Column 6, line 4-5] “The virtual assistant may update the transcription algorithms based on the corrections”).

Regarding Claim 7, McQuiston discloses a method comprising: establishing, by a video conference provider, a virtual meeting having a plurality of participants, each participant of the plurality of participants exchanging one or more audio streams via the virtual meeting ([Column 2, line 10-23] “the virtual assistant receiving at least an audio feed and a video feed of an electronic meeting in real-time”; [Column 3, line 17-18] FIG. 1 “system 100 may include one or more communication devices 110 associated with one or more users 105”); 
generating, by the video conference provider a transcript of at least a subset of the one or more audio streams exchanged during the virtual meeting using a speech recognition system ([Column 2, line 10-23] “the virtual assistant transcribing the audio feed using a speech-recognition algorithm”); 
determine a segment of the transcript for correction ([0057] “attendees receiving the live transcription may make corrections in the transcript. The virtual assistant may apply that correction in real-time. The virtual assistant may update the transcription algorithms based on the corrections.”);
receiving, from the first client device, a correction to one or more transcribed words within the segment of the transcript ([Column 2, line 10-23] “the virtual assistant receiving an edited transcription”); 
and updating, by the video conference provider, the speech recognition system based on the correction to the one or more transcribed words within the transcript ([Column 2, line 10-23] “the virtual assistant updating the speech recognition algorithm based on the edited transcription”).
McQuiston fails to teach transmitting, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams;
However, Poddar teaches transmitting, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams (FIG. 1A, Block 15-16, [0025] “In block 15, a feedback report with the categorized segment containing the corrected transcript snippet and call recording snippet is sent to a human teacher. In block 16, the corrected transcript snippet, incorrect transcript snippet, and the recording snippet are transmitted to a custom transcription model for retraining using the new data”);
McQuiston in view of Poddar are considered to be analogous to the claimed invention because both are the same field of conversation analysis.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of a virtual assistant to transcribe, record, and analyze discussions during a meeting of McQuiston with the technique of transmitting a portion of a transcript along with the corresponding audio portion taught by Poddar in order to improve improving a virtual speech agent's natural language understanding. (see Poddar [0002]).

Regarding Claim 15, McQuiston discloses a non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”): 
establish a virtual meeting having a plurality of participants, each participant of the plurality of participants exchanging one or more audio streams via the virtual meeting ([Column 2, line 10-23] “the virtual assistant receiving at least an audio feed and a video feed of an electronic meeting in real-time”; [Column 3, line 17-18] FIG. 1 “system 100 may include one or more communication devices 110 associated with one or more users 105”); 
generate a transcript of at least a subset of the one or more audio streams exchanged during the virtual meeting using a speech recognition system ([Column 2, line 10-23] “the virtual assistant transcribing the audio feed using a speech-recognition algorithm”); 
determine a segment of the transcript for correction ([0057] “attendees receiving the live transcription may make corrections in the transcript. The virtual assistant may apply that correction in real-time. The virtual assistant may update the transcription algorithms based on the corrections.”);
receive, from the first client device, a correction to one or more transcribed words within the segments of the transcript ([Column 2, line 10-23] “the virtual assistant receiving an edited transcription”); 
and update the speech recognition system based on the correction to the one or more transcribed words within the transcript ([Column 2, line 10-23] “the virtual assistant updating the speech recognition algorithm based on the edited transcription”).
McQuiston fails to teach transmit, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams;
However, Poddar teaches transmit, to a first client device, the segment of the transcript and only a snippet of a first audio stream corresponding to the segment, the first audio stream from the subset of the one or more audio streams (FIG. 1A, Block 15-16, [0025] “In block 15, a feedback report with the categorized segment containing the corrected transcript snippet and call recording snippet is sent to a human teacher. In block 16, the corrected transcript snippet, incorrect transcript snippet, and the recording snippet are transmitted to a custom transcription model for retraining using the new data”);
McQuiston in view of Poddar are considered to be analogous to the claimed invention because both are the same field of conversation analysis.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques of a virtual assistant to transcribe, record, and analyze discussions during a meeting of McQuiston with the technique of transmitting a portion of a transcript along with the corresponding audio portion taught by Poddar in order to improve improving a virtual speech agent's natural language understanding. (see Poddar [0002]).

Regarding Claim 16, McQuiston in view of Poddar teaches all of the limitations of claim 15, upon which claim 16 depends.  
Additionally, McQuiston teaches the non-transitory computer-readable medium of claim 15, wherein the processor-executable instructions to generate the transcript of at least the subset of the one or more audio streams cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”): 
generate the transcript of at least the subset of the one or more audio streams during the virtual meeting as the audio streams are being exchanged between the plurality of participants ([Column 5, line 63-65] “the virtual assistant may transcribe the audio and stream the transcription to the attendees in real-time”).

Regarding Claim 17, McQuiston in view of Poddar teaches all of the limitations of claim 15, upon which claim 17 depends.  
Additionally, McQuiston teaches the non-transitory computer-readable medium of claim 15, wherein the processor-executable instructions to generate the transcript of at least the subset of the one or more audio streams cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”; [Column 2, line 10-23] “the virtual assistant transcribing the audio feed using a speech-recognition algorithm”): 
generate the transcript after the virtual meeting is terminated ([Column 5, line 39-41]” the transcript may occur simultaneously with the meeting or may take place after the meeting is done, based on the audio recording.”).

Regarding Claim 19, McQuiston in view of Poddar teaches all of the limitations of claim 15, upon which claim 19 depends.  
Additionally, McQuiston teaches wherein the segment of the transcript correspond to an audio stream associated with the first client device ([Column 5, line 38-41] “In step 215, the virtual assistant may transcribe the audio of the meeting. The transcript may occur simultaneously with the meeting or may take place after the meeting is done, based on the audio recording”).

Claims 2-3, 8-14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over McQuiston in view of Poddar, as shown above in claim 1, in further view of Lewis (US 20210074277 A1).

Regarding claim 2, McQuiston in view of Poddar teaches all of the limitations of claim 1, upon which claim 2 depends.  
Additionally, McQuiston teaches wherein the instructions to generate the transcript further cause the processor to execute further processor-executable instructions stored in the non- transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”):
 transcribe a plurality of spoken words into a plurality of transcribed words ([Column 2, line 10-23] “the virtual assistant transcribing the audio feed using a speech-recognition algorithm”); 
McQuiston in view of Poddar fails to teach generate a confidence score for each of the plurality of transcribed words; and determine one or more transcribed words within the plurality of transcribed words having a low confidence score.
However, Lewis teaches generate a confidence score for each of the plurality of transcribed words; and determine one or more transcribed words within the plurality of transcribed words having a low confidence score ([0031] “An overall confidence value may be assessed based on an alignment of a candidate word (e.g., a single word or a longer phrase) with a candidate audio segment of the audio signal. The confidence value may indicate a quality of the alignment, e.g., based on statistical features and confidence values output by statistical models, neural networks, and/or acoustical models included in the language model”, the confidence value (score) provides a correlation between the audio steam and transcribed word).
McQuiston in view of Poddar in view of Lewis are considered to be analogous to the claimed invention because all are in the same field of automatic speech recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method speech analysis of McQuiston in view of Poddar with the scoring systems done by language models to detect error taught by Lewis in order to allow for efficient revising of a transcription output of an automatic speech recognition system (see Lewis [0009]).

Regarding claim 3, McQuiston in view of Poddar in view of Lewis teaches all of the limitations of claim 2, upon which claim 3 depends.  
Additionally, Lewis teaches provide, to the first client device, an indication of the one or more transcribed words having the low confidence score ([0040]” the system identifies a portion of the transcription that has a likelihood of transcription error based on the output of one or more models used in determining the transcription… the system graphically indicates the identified portion of the transcription, for example, by displaying a box around the text in the identified portion, or by displaying the text in the portion in a different color, font, style (e.g., underlining or italics), size, emphasis, etc.”); 
and request, from the first client device, feedback on the one or more transcribed words having the low confidence score ([0019] “the program 108a may be used to identify and graphically indicate the portion of the transcription eligible for the correction to the user, so the user can review the graphically indicated portion”).

Regarding claim 8, McQuiston in view of Poddar teaches all of the limitations of claim 7, upon which claim 8 depends.  
McQuiston in view of Poddar fails to teach receiving an indication that the one or more transcribed words are not a correct transcript of corresponding spoken words within at least the subset of the one or more audio streams; and receiving one or more corrected words corresponding to the one or more transcribed words.
However, Lewis teaches receiving an indication that the one or more transcribed words are not a correct transcript of corresponding spoken words within at least the subset of the one or more audio streams ([0019] “the proofreading program 108a may identify the portion of the transcription that should be corrected either prior to, concurrently with, or after the user inputs text input for the correction”); 
and receiving one or more corrected words corresponding to the one or more transcribed words ([0042] FIG 2. “At step 214, the system receives a text input from the user indicating a revision to the transcription”).
McQuiston in view of Poddar in view of Lewis are considered to be analogous to the claimed invention because both are in the same field of automatic speech recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech analysis systems of McQuiston in view of Poddar with the method of indication to correct the transcript taught by Lewis in order to allow for efficient revising of a transcription output of an automatic speech recognition system (see Lewis [0009]).

Regarding claim 9, McQuiston in view of Poddar teaches all of the limitations of claim 7, upon which claim 9 depends.  
McQuiston in view of Poddar fails to teach determining a profile associated with the first client device; determining a confidence score for the correction received from the first client device based on the profile associated with the first client device.
However, Lewis teaches determining a profile associated with the first client device ([0058] “Computing system 100 from FIG. 1 is a computer system configured to provide any to all of the compute functionality described herein”, examiner interprets this communication occurring to the profile of the first client device”); 
determining a confidence score for the correction received from the first client device based on the profile associated with the first client device ([0054] “the speech recognition system 100 may also be configured to recognize a selection of one of the plurality of text candidates by the user 11”, user 11 (profile) associated with device 100; [0017] “Input subsystem 104 may include any suitable input devices to allow user 11 to supply corrections and otherwise interact with speech recognition system 100”, modifying confidence values for text candidates/corrections approved by the user 11 (profile)).
	McQuiston in view of Poddar in view of Lewis are considered to be analogous to the claimed invention because both are in the same field of automatic speech recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified speech analysis systems of McQuiston in view of Poddar with the scoring systems done by language models to detect error taught by Lewis in order to allow for efficient revising of a transcription output of an automatic speech recognition system (see Lewis [0009]).

Regarding claim 10, McQuiston in view of Poddar in view of Lewis teaches all of the limitations of claim 9, upon which claim 10 depends.  
Additionally, Lewis teaches determining, based on the profile, a number of corrections received from the first client device ([0053] “The speech recognition system 100 then receives an input from the user 11 selecting one of the plurality of text candidates to replace the portion of the transcription that has the likelihood of transcription error”, the user indicates the number of corrections needed on the transcript from their client device); 
and determining, based on the number of corrections received from the first client device, the confidence score for the correction received from the first client device ([0054] “ the speech recognition system 100 may also be configured to recognize a selection of one of the plurality of text candidates by the user 11 as signals of approval, recognize a non-selection of one of the plurality of text candidates by the user 11 as a signals of disapproval, and incorporate human feedback into the reinforcement learning process for teaching the acoustical model 125”, speech recognition system 100 can provide correction to the confidence score).

Regarding claim 11, McQuiston in view of Poddar teaches all of the limitations of claim 7, upon which claim 11 depends.  
Additionally, McQuiston teaches identifying a plurality of spoken words in at least the subset of the one or more audio streams ([Column 2, line 10-23] “the virtual assistant receiving at least an audio feed and a video feed of an electronic meeting in real-time”);  
transcribing the plurality of spoken words into a plurality of transcribed words ([Column 2, line 10-23] “the virtual assistant transcribing the audio feed using a speech-recognition algorithm”);
McQuiston in view of Poddar fails to teach scoring each of the plurality of transcribed words using a confidence level.
However, Lewis teaches scoring each of the plurality of transcribed words using a confidence level ([0040]” the system identifies a portion of the transcription that has a likelihood of transcription error based on the output of one or more models used in determining the transcription…”, examiner interprets determining likelihood of error as scoring).
McQuiston in view of Poddar in view of Lewis are considered to be analogous to the claimed invention because both are in the same field of automatic speech recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of speech analysis of McQuiston in view of Poddar with the scoring systems done by language models to detect error taught by Lewis in order to allow for efficient revising of a transcription output of an automatic speech recognition system (see Lewis [0009]).

Regarding claim 12, McQuiston in view of Poddar in view of Lewis teaches all of the limitations of claim 11 upon which claim 12 depends. 
Additionally, Lewis teaches determining one or more transcribed words within the plurality of transcribed words having a low confidence level; ([0031] “An overall confidence value may be assessed based on an alignment of a candidate word (e.g., a single word or a longer phrase) with a candidate audio segment of the audio signal. The confidence value may indicate a quality of the alignment, e.g., based on statistical features and confidence values output by statistical models, neural networks, and/or acoustical models included in the language model”) 
transmitting, to the first client device, the one or more transcribed words having the low confidence level; and requesting feedback on the one or more transcribed words having the low confidence level ([0019] “the program 108a may be used to identify and graphically indicate the portion of the transcription eligible for the correction to the user, so the user can review the graphically indicated portion”).

	Regarding claim 13, McQuiston in view of Poddar in view of Lewis teaches all of the limitations of claim 8, upon which claim 13 depends.
Additionally, McQuiston teaches wherein receiving, from the first client device, the correction to one or more transcribed words within the segment of the transcript occurs during the virtual meeting, and the method further comprises ([Column 2, line 10-23] “the virtual assistant receiving an edited transcription”): 
transcribing at least the subset of the one or more audio streams for a remainder of the virtual meeting using the correction to the one or more transcribed words ([Column 3, line 58-63] “virtual assistant 135 may record the audio and/or video from a meeting (received as a stream), identify individuals in the stream, transcribe the stream, identify action items in the stream, summarize the discussion in the stream, analyze the sentiment in the stream, and/or publish a recording and/or analysis of the stream to the attendees of the meeting”, the virtual assistant is able to transcribe the remainder of the meeting through a recording and implement changes the user has provided).

Regarding claim 14, McQuiston in view of Poddar in view of Lewis teaches all of the limitations of claim 8, upon which claim 14 depends.
Additionally, McQuiston teaches wherein the segment of the transcript transmitted to the first client device correspond to an audio stream corresponding to the first client device ([Column 2, line 10-23] “the virtual assistant providing the transcription to at least one of the plurality of attendees”, the transmitted transcript corresponds to the user’s audio steam that needs correction; [Column 5, line 38-41] “In step 215, the virtual assistant may transcribe the audio of the meeting. The transcript may occur simultaneously with the meeting or may take place after the meeting is done, based on the audio recording”).

	Regarding claim 20, McQuiston in view of Poddar teaches all of the limitations of claim 15, upon which claim 20 depends.  
McQuiston in view of Poddar fails to teach provide, to the first client device, a correction metric based in part on the correction to the one or more transcribed words.
However, Lewis teaches provide, to the first client device, a correction metric based in part on the correction to the one or more transcribed words ([0019] “the program 108a may be used to identify and graphically indicate the portion of the transcription eligible for the correction to the user, so the user can review the graphically indicated portion”, examiner interprets a graphical representation indicating correction to be a correction metric).
McQuiston in view of Poddar in view of Lewis are considered to be analogous to the claimed invention because both are in the same field of automatic speech recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech analysis system of McQuiston in view of Poddar with the method of providing correction metrics to the user taught by Lewis in order to allow for efficient revising of a transcription output of an automatic speech recognition system (see Lewis [0009]).

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over McQuiston in view of Poddar as shown in claim 15 above, in further view of White et. al US 8352264 B2, (hereinafter White).

Regarding claim 18, McQuiston in view of Poddar teaches all of the limitations of claim 15, upon which claim 18 depends.  
Additionally, McQuiston teaches wherein the processor is configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium to ([Column 9, line 19-40] “the embodiments may illustratively be embodied in the form of a processing machine, including a computer or computer system, for example, that includes at least one memory. It is to be appreciated that the set of instructions, i.e., the software for example, that enables the computer operating system to perform the operations described…”):
McQuiston in view of Poddar fails to teach receive, from a second client device, an indication to transcribe a second set of audio streams corresponding to a second virtual meeting; and transcribe the second set of audio streams using the correction to the one or more transcribed words.
However, White teaches receive, from a second client device, an indication to transcribe a second set of audio streams corresponding to a second virtual meeting ([Column 3, line 13-18] “As users interact with said mobile clients and web interfaces, they will see the results of their audio input returned as transcribed text. Any words that could have statistically been similar to other possibilities for that spoken word will appear highlighted and contain an n-best drop down list, for example”, a second client device (FIG 1) shows highlighted (indication) words that could require another audio stream to produce a more accurate transcription;
 and transcribe the second set of audio streams using the correction to the one or more transcribed words ([Column 3, line 19-29] “The user can then correct the phrase by choosing the intended word in the drop down list, manually editing and replacing the n-best results with the actual word, or speaking the correct form of the word again and performing another speech recognition query to generate the correct form of the word. Even without an n-best result, the user could still update or revise a given word. Once the original transcribed message is corrected, the user can then send the message for delivery to the intended recipient. In the use cases of a transcribed memo or voicemail, the corrected result is fed back into the speech recognition platform to modify core, application, or user centric models”, the model is modified based on the correction (for the future use ); [Column 3, line 37-43] “Once the corrected result is returned to the user, the corrected result is paired with the original outgoing result, and the LM is updated giving the correction a higher statistical probability than initially generated so that future queries will have a higher likelihood of generating the correct textual representation of the spoken word”; [Column 17, line 20-24] FIG. 7“Importantly, however, at step 725 information regarding the edits and corrections made by the user is provided to the LMs used in the transcription process, or in some cases to other LMs as well, and at step 730 the information is used to update the LMs for further use, represented by the loop back to step 705”, transcription in a different session based on an updated model).
McQuiston in view of Poddar in view of White are considered to be analogous to the claimed invention because both are in the same field of automatic speech recognition.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the speech analysis systems of McQuiston in view of Poddar with the transcription in a different session based on an updated model taught by White in order to produce a transcript that accurately represents the spoken words (see White [Column 17, line 20-24]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Daredia et al. (US 20200403817 A1) teaches a methods, systems, and non-transitory computer readable storage media are disclosed for generating meeting insights based on media data and device input data. For example, in one or more embodiments, the disclosed system utilizes analyzes media data including audio data or video data and inputs to client devices associated with a meeting to determine a portion of the meeting (e.g., a portion of the media data) that is relevant for a user. In response to determining a relevant portion of the meeting, the system generates an electronic message including content related to the relevant portion of the meeting. The system then provides the electronic message to a client device of the user. For instance, in one or more embodiments, the system generates a meeting summary, meeting highlights, or action items related to the media data to provide to the client device of the user. In one or more embodiments, the system also uses the summary, highlights, or action items to train a machine-learning model for use with future meetings.
	Arrowood et al. (US 20100332225 A1) teaches some general aspects relate to systems and methods for media processing. One aspect, for example, relates to a method for aligning multimedia recording with a transcript. A group of search terms are formed from the transcript, with each search term being associated with a location within the transcript. Putative locations of the search terms are determined in a time interval of the multimedia recording. For each search term, zero or more putative locations are determined and, for at least some of the search terms, multiple putative locations are determined in the time interval of the multimedia recording. According to a first sequencing constraint, a first representation of a group of sequences each of a subset of the putative locations of the search terms is formed. A second representation of a group of sequences each of a subset of the search terms is formed. Using the first and the second representations, the time interval of the multimedia recording is partially aligned with the transcript. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Jun 24, 2022
Application Filed
Apr 04, 2024
Non-Final Rejection — §103
Jul 09, 2024
Response Filed
Oct 22, 2024
Final Rejection — §103
Apr 01, 2025
Request for Continued Examination
Apr 02, 2025
Response after Non-Final Action
Apr 13, 2025
Non-Final Rejection — §103
Jul 18, 2025
Response Filed
Sep 11, 2025
Final Rejection — §103
Nov 17, 2025
Response after Non-Final Action
Dec 17, 2025
Request for Continued Examination
Jan 15, 2026
Response after Non-Final Action
Jan 21, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/992,340
Patent 12579373
SYSTEM AND METHOD FOR SYNTHETIC TEXT GENERATION TO SOLVE CLASS IMBALANCE IN COMPLAINT IDENTIFICATION
2y 5m to grant Granted Mar 17, 2026
17/915,465
Patent 12555575
Wakeup Indicator Monitoring Method, Apparatus and Electronic Device
2y 5m to grant Granted Feb 17, 2026
17/682,177
Patent 12518090
LOGICAL ROLE DETERMINATION OF CLAUSES IN CONDITIONAL CONSTRUCTIONS OF NATURAL LANGUAGE
2y 5m to grant Granted Jan 06, 2026
17/820,285
Patent 12511318
MULTI-SYSTEM-BASED INTELLIGENT QUESTION ANSWERING METHOD AND APPARATUS, AND DEVICE
2y 5m to grant Granted Dec 30, 2025
17/914,033
Patent 12512088
METHOD AND SYSTEM FOR USER-INTERFACE ADAPTATION OF TEXT-TO-SPEECH SYNTHESIS
2y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
52%
Grant Probability
99%
With Interview (+55.0%)
3y 2m
Median Time to Grant
High
PTA Risk
Based on 31 resolved cases by this examiner. Grant probability derived from career allow rate.
VOICE TRANSCRIPTION FEEDBACK FOR VIRTUAL MEETINGS

This examiner grants 52% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email