Last updated: May 29, 2026
Application No. 17/923,620
INCREMENTAL POST-EDITING AND LEARNING IN SPEECH TRANSCRIPTION AND TRANSLATION SERVICES

Final Rejection §103
Filed
Nov 07, 2022
Priority
May 08, 2020 — provisional 63/022,025 +1 more
Examiner
CHAVEZ, RODRIGO A
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
2 (Final)
This examiner grants 51% of cases after interview

— +38.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 233 resolved cases, 2023–2026
Examiner Intelligence

CHAVEZ, RODRIGO A View full profile →
Grants 51% of resolved cases
Career Allowance Rate
119 granted / 233 resolved
-10.9% vs TC avg
Strong +39% interview lift
Without
With
+38.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
15 currently pending
Career history
252
Total Applications
across all art units
Statute-Specific Performance

§101
3.8%
-36.2% vs TC avg
§103
84.7%
+44.7% vs TC avg
§102
9.6%
-30.4% vs TC avg
§112
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 233 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/26/2025 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant's arguments filed 09/26/2025 have been fully considered but they are not persuasive.	Regarding the rejections of claims 32-35, 37 and 40-50 under 35 U.S.C. 102(a)(1) and claims 36, 38, 39 and 51 under 35 U.S.C. 103, the applicant argues:
	“…The Office Action concedes that McFarland fails to disclose this limitation, and instead, the Office Action relies on Waibel. Id. But Waibel does not disclose this limitation. 
	Instead, and as the Office Action states, Waibel merely discloses that ‘The user may either identify and log an error, or, if he/she wishes, correct an error in the speech recognition or translation output.’ Id. At 22 (emphasis added). Waibel does not disclose that the ‘language translation module is configured to, during the audio session: identify a high-risk word.’ Thus, Waibel does not disclose the limitations of claim 32, as amended.”

Regarding applicant’s arguments, the examiner respectfully disagrees. The examiner contends that the recitation in previous claim 39 and now in claims 32 and 47, fails to place any limit on how a “high-risk word” should be defined. The examiner contends that no decription is provided as to what makes a word “high-risk” or what specific threshold should be met to determine a high-risk word from a word that is not “high-risk”. Although the instant application’s specification, as noted by the applicant, in p. 0036 provides examples of what may be constituted as a “high-risk word”:
	Spec. at [0036] “…vulgar language, insults, sexist language, racist language, hate speech, politically or socially charged concepts and words.”

These examples merely provide a suggestive and non-exhaustive list as to what the word may represent, but fail to specifically produce any meaningful specificity as to how a word may be constituted as “high-risk”. One example is that, in languages that are spoken in different regions/countries of the world, one may find that a single word may be considered “vulgar” or “hate speech” in one region/country, while in a different region/country, the same word may not be considered “vulgar” or “hate speech”. Thus, the examiner contends that neither the recited elements in the claim nor the cited portions of the specification provide a standardized method or limit on how to determine what constitutes a “high-risk word”. To this effect, the examiner contends that, under broadest reasonable interpretation, the teaching of Waibel does teach a “high-risk word” because Waibel provides for flagging errors in speech recognition for correction, which the examiner contends that an error in speech recognition may be regarded as “high-risk” because one of ordinary skill may find that an error in automatic speech recognition, especially when conducted in real-time public hearings (as in McFarland p. 0005), may pose a threat in the intelligibility of such public hearing, such that anyone spectating such event, who relies on these mechanisms, may become undermined, or even, in extreme cases, insulted. Thus, the examiner contends that McFarland in view of Waibel does teach the aforementioned language.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “an automatic speech recognition module” “a language translation module” “a correction module” in claims 32-46.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 32-38 and 40-51 are rejected under 35 U.S.C. 103 as being unpatentable over McFarland in view of Waibel (US PG Pub 20110307241).

	As per claims 32 and 47, McFarland discloses:	A system and method (McFarland; p. 0033 & p. 0072) comprising: 	one or more processors (McFarland; p. 0058 – microphone array processor 520 & speaker identification processor 530; p. 0060 - features extraction processor 535; p. 0063 - text language processor 592) configured to execute processor-executable instructions stored in a non-transitory computer-readable medium (McFarland; p. 0048-0049 - Software implementing the procedures, systems and methods described herein can be stored in the memory of any computer system as a set of executable instructions…), the processor-executable instructions comprising: 	an automatic speech recognition module to receive audible output in a first human language during an audio session (McFarland; p. 0034-0035 - Transcription software may execute in server 25 and takes audio input from at least one microphone 15 via noise filter 20… the software can determine who is speaking by the connection of the microphone and/or by the volume level of that microphone; see also p. 0045) and convert the audible output to transcribed text in the first human language  (McFarland; p. 0034-0035 - Multiple voice recognition profiles can be simultaneously executed in the server 25 while immediately translating the spoken word to text); and 	a language translation module for translating the transcribed text in the first human language to translation text in a second human language (McFarland; p. 0063 - according to language decision module 585 if a language conversion is needed, the original output is saved at step 590 and text to text language processor 592 converts a copy of the original output to the target language upon which text to speech converter 594 outputs to a selected output device 596; see also p. 0072 - The system 10 can accept language translation commands and will translate from one language to another as required. The audio/video and transcript are synchronized files stored on a hard disk of the computer processing the voice translation and also on the remote computers if the option is selected); 	a correction module in communication with one or more client devices (McFarland; p. 0043 - The edit mode 315 provides a command interface, inclusion of presets, templates, text, and a spell checker. Additionally, in the edit mode 315, text can be highlighted and the audio/video can be played back, A dictionary can be edited wherein words can be added. Speech converted to text can be formatted and printed; see also p. 0037, 0040, 0054-0055 & 0071), wherein the correction module: 	receives corrective inputs, wherein the corrective inputs comprise corrections to at least one of the transcribed text in the first language or the translated text in the second human language (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0066 & p. 0070); and 	updates at least one of the automatic speech recognition module or the language translation module based on the received corrected inputs, such that the automatic speech recognition module or the language translation module uses the corrective inputs in generating the transcribed text in the first human language or translating the transcribed text to the second human language for a remainder of the audio session (McFarland; p. 0037 - A court reporter/computer operator is given the opportunity to edit the transcription, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0056 - The system allows the user to see the text, and edit the same in real time and the user is able to define unrecognized voice, which will be used for subsequent translation; see also p. 0066 - the system accepts operator input of corrections to the translation, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070 - While real time is taking place, a court reporter/computer operator, can make corrections and define incorrect voice translations and have those corrections apply to all future translations. The corrections only apply to the profile/dictionary that was opened at that particular point in the transcript. As corrections are made and parentheticals (unspoken text) are inserted by the court reporter/computer operator, the system 10 can refresh each connected computer accordingly. The system 10 has a list of all parentheticals, which can be selected for automatic insertion in the transcript).
	McFarland, however, fails to disclose wherein the language translation module is configured to, during the audio session; identify a high-risk word in the translated text in the second human language flag the high-risk word in the display of the translated text; and update at least one of the automatic speech recognition module or the language translation module based on the received corrected inputs for the high-risk word.
	Waibel does teach disclose wherein the language translation module is configured to, during the audio session; identify a high-risk word in the translated text in the second human language flag the high-risk word in the display of the translated text; and update at least one of the automatic speech recognition module or the language translation module based on the received corrected inputs for the high-risk word (Waibel; p. 0061-0062 - To enable field customizable speech translation, the system permits error correction and later learning from these errors through the operation of the correction and repair module 11 in combination with a user field customization module 12… The Correction and Repair Module enables a user to intervene in the speech-to-speech translation process at any time. The user may either identify and log an error, or, if he/she wishes, correct an error in the speech recognition or translation output. Such user intervention is of considerable value, as it provides immediate correction in the human-human communication process, and opportunities for the system to adjust to user needs and interests and to learn from mistakes. If the user is dissatisfied with a translation of an utterance (i.e. an error occurs) the user can log the current input. The system will save audio of the current utterance as we'll as other information to a log file. This can be accessed and corrected by the user at a later time, or can be uploaded to a community database to allow expert users to identify and correct errors).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the system and method of McFarland in view of Leydon to include disclose wherein the language translation module is configured to, during the audio session; identify a high-risk word in the translated text in the second human language flag the high-risk word in the display of the translated text; and update at least one of the automatic speech recognition module or the language translation module based on the received corrected inputs for the high-risk word, as taught by Waibel, in order to enabling users to add new vocabulary items and improving and modifying the content and usage of their system in the field, without requiring linguistic or technical knowledge or expertise for field maintenance (Waibel; p. 0003).		As per claims 33 and 48, McFarland in view of Waibel discloses	The system and method of claims 32 and 47, wherein: the audio session comprises a live audio session by the speaker (McFarland; p. 0007 - The real-time voice transcription system provides a speech recognition system and method that includes use of speech and spatial-temporal acoustic data to enhance speech recognition probabilities while simultaneously identifying the speaker), and generating the transcribed text in the first human language and to translate the transcribed text to the translation text in the second human language during the live audio session (McFarland; p. 0035 - The system 10 is capable of transcribing a single voice for captioning for deaf students and television news broadcasts as well as inputs from multiple voices. Real time translation and editing of the real time text for immediate delivery of transcription is provided by the system 10; see also p. 0054-0055 - The system 10 can transcribe multiple voices even when spoken concurrently at different microphones 15 and identify each speaker separately as the voices are buffered within the computer 25. Multiple channels may be used for this feature. Another option can be to select that all participants are translated and displayed on the screen with a space between each participant when more then one speak at the same time. When one participant stops speaking, the blank space between speakers automatically disappears. The text is in different colors for each speaker making it immediately apparent who is speaking… The system 10 translates in real time and displays the text in an interface that allows for a court reporter/computer operator to edit the translation as it is taking place). 	As per claims 34 and 49, McFarland in view of Waibel discloses:	The system and method of claims 33 and 48, wherein: the one or more client devices are configured to, during the live audio session, display the translated text and accept the corrective inputs; and the speech recognition and translation computer system is configured to, during the live audio session, receive the corrective inputs and update the language translation module (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0055 - The system 10 translates in real time and displays the text in an interface that allows for a court reporter/computer operator to edit the translation as it is taking place; see also p. 0066 - the system accepts operator input of corrections to the translation. At step 627, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070-0071 - While real time is taking place, a court reporter/computer operator, can make corrections and define incorrect voice translations and have those corrections apply to all future translations. The corrections only apply to the profile/dictionary that was opened at that particular point in the transcript. As corrections are made and parentheticals (unspoken text) are inserted by the court reporter/computer operator, the system 10 can refresh each connected computer accordingly… Each connected computer, such as computers 30 or computers 40 and 45 has the option of receiving a signal from the translating computer 25 or viewing the translated text on the computer processing the voice translations). 	As per claims 35 and 50, McFarland in view of Waibel discloses:	The system and method of claims 34 and 47, wherein the language translation module is configured to, after receiving the corrective inputs from the users of the one or more client devices, update the translated text displayed on the user interface session to include, in a presentation mode, the corrective inputs (McFarland; p. 0037 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator, Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0070). 	As per claim 37, McFarland in view of Waibel discloses:	The system of claim 32, wherein the language translation module is configured to, after the audio session, transfer the corrective inputs to a long term memory for the language translation module (McFarland; p. 0037 - if edits are received, the system 410 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0055 - The system 10 translates in real time and displays the text in an interface that allows for a court reporter/computer operator to edit the translation as it is taking place. When a new text is defined for a mistranslated or un-translated voice, this data is stored in a default rules or user selected rules file and, going forward, the translation will use the new definition; see also p. 0070 - While real time is taking place, a court reporter/computer operator, can make corrections and define incorrect voice translations and have those corrections apply to ail future translations). 		As per claim 40, McFarland in view of Waibel discloses:	The system of claim 32, wherein the audio session comprises an audible voice dialog between the speaker with a second speaker (McFarland; p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings).		As per claim 41, McFarland in view of Waibel discloses:	The system of claim 32, wherein the audio session comprises a recording of audible output by the speaker (McFarland; p. 0073 - The transcription to be executed in a computer, PDA or other processing device with voice recording capability and transferred via hard wired/wireless network to a back office computer where it will be validated by a transcriber).	As per claim 42, McFarland in view of Waibel discloses:	The system of claim 32, wherein the recording comprises a multimedia recording (McFarland; p. 0039 - The system 10 is capable of broadcasting over the Internet 35 or using the Internet 35 to send audio and video to a remote site 40 or alternative remote site 45 for remote translation and/or editing and for remote viewing and listening; see also p. 0043 - The edit mode 315 provides a command interface, inclusion of presets, templates, text, and a spell checker, in the edit mode 315, text can be highlighted and the audio/video can be played back). 	As per claim 43, McFarland in view of Waibel discloses:	The system of claim 33, wherein: in the editor mode, the user interface of the one or more client devices is further configured to: display the transcribed text in the first human language during the audio session (McFarland; p. 0037 - the voice is converted to text by using at least one lexicon adapted to the professor's speech, punctuation and formatting logic is applied to the transcribed speech and broadcast to students. A court reporter/computer operator is given the opportunity to edit the transcription; see also p. 0040 - The basic functionality of the system 10 is a voice recognition transcription system that displays text in a user-friendly interface; see also p. 0054 - The system 10 can transcribe multiple voices even when spoken concurrently at different microphones 15 and identify each speaker separately as the voices are buffered within the computer 25. Multiple channels may be used for this feature. Another option can be to select that all participants are translated and displayed on the screen with a space between each participant when more then one speak at the same time. When one participant stops speaking, the blank space between speakers automatically disappears. The text is in different colors for each speaker making it immediately apparent who is speaking; see also p. 0055 - The system 10 translates in real time and displays the text in an interface that allows for a court reporter/computer operator to edit the translation as it is taking place; see also p. 0071); and accept transcribed-text corrective inputs to the displayed transcribed text from the user of each of the one or more client device during the audio session (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0066 - the system accepts operator input of corrections to the translation. At step 627, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070); and the speech recognition and translation computer system is further configured to: receive the transcribed-text corrective inputs from the users of the one or more client devices during the audio session (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules fila and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0066 - the system accepts operator input of corrections to the translation. At step 627, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070); and update the automatic speech recognition module based on the received transcribed-text corrected inputs during the audio session, such that the automatic speech recognition module uses the transcribed-text corrective inputs in recognizing the audible output by the speaker ‘during the audio session (McFarland; p. 0037 - A court reporter/computer operator is given the opportunity to edit the transcription, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0056 - The system allows the user to see the text, and edit the same in real time and the user is able to define unrecognized voice, which will be used for subsequent translation; see also p. 0066 - the system accepts operator input of corrections to the translation, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070). 		As per claim 44, McFarland in view of Waibel discloses:	The system of claim 43, wherein the speech recognition and translation computer system is further configured to, upon receiving a transcribed-text corrective input that is applicable to a portion of the transcribed text, re-translate the portion of the transcribed text to the second human language such that the user interfaces of the one or more client devices display the re-translated portion in the second human language (McFarland; p. 0070 - While real time is taking place, a court reporter/computer operator, can make corrections and define incorrect voice translations and have those corrections apply to all future translations. The corrections only apply to the profile/dictionary that was opened at that particular point in the transcript. As corrections are made and parentheticals (unspoken text) are inserted by the court reporter/computer operator, the system 10 can refresh each connected computer accordingly. The system 10 has a list of all parentheticals, which can be selected for automatic insertion in the transcript).	As per claim 45, McFarland in view of Waibel discloses:	The system of claim 32, further comprising: a storage for storing a recording of the audio session (McFarland; p. 0072 - The audio/video and transcript are synchronized files stored on a hard disk of the computer processing the voice translation and also on the remote computers if the option is selected); and audio output for audibly playing the recording of the audio session (McFarland; p. 0072 - The audio/video and transcript are synchronized files stored on a hard disk of the computer processing the voice translation and also on the remote computers if the option is selected. This makes it possible to select any portion of the text for playback when a participant in the proceedings asks for the record to be read back); wherein the system is configured to generate the transcribed text in the first human language (McFarland; p. 0033-0037 - Multiple voice recognition profiles can be simultaneously executed in the server 25 while immediately translating the spoken word to text… the system 10 can accept speech input from a lecturing professor, the voice is converted to text by using at least one lexicon adapted to the professor's speech, punctuation and formatting logic is applied to the transcribed speech…; see also p. 0045 - Microphone voice input 55 can be accepted by a voice link function 65. The voice link function 65 is also capable of accepting PCM formatted voice input 57 and WAV file input 60, The voice link 65 provides an interface between the aforementioned speech input types and the speech recognition layer 70, as well as the general utilities and database components layer 75. A plurality of speech recognition engines such as first SR engine 50 a and second SR engine 50b can be in operable communication with the speech recognition layer 70) and to translate the transcribed text to the translation text in the second human language during a playing of the recorded audio session (McFarland; p. 0063 - according to language decision module 585 if a language conversion is needed, the original output is saved at step 590 and text to text language processor 592 converts a copy of the original output to the target language upon which text to speech converter 594 outputs to a selected output device 596; see also p. 0072 - The system 10 can accept language translation commands and will translate from one language to another as required. The audio/video and transcript are synchronized files stored on a hard disk of the computer processing the voice translation and also on the remote computers if the option is selected); during the playing of the recorded audio session, cause the translated text to be displayed (McFarland; p. 0037 - the voice is converted to text by using at least one lexicon adapted to the professor's speech, punctuation and formatting logic is applied to the transcribed speech and broadcast to students, A court reporter/computer operator is given the opportunity to edit the transcription; see also p. 0040 - The basic functionality of the system 10 is a voice recognition transcription system that displays text in a user-friendly interface; see also p. 0054 - The system 10 can transcribe multiple voices even when spoken concurrently at different microphones 15 and identify each speaker separately as the voices are buffered within the computer 25. Multiple channels may be used for this feature. Another option can be to select that all participants are translated and displayed on the screen with a space between each participant when more then one speak at the same time. When one participant stops speaking, the blank space between speakers automatically disappears. The text is in different colors for each speaker making it immediately apparent who is speaking; see also p. 0055 & 0071) and accept the corrective inputs (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0066 - the system accepts operator input of corrections to the translation. At step 627, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070); and during the playing of the recorded audio session, receive the corrective inputs and update the language translation module (McFarland; p. 0037 - A court reporter/computer operator is given the opportunity to edit the ‘transcription, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0056 - The system allows the user to see the text, and edit the same in real time and the user is able to define unrecognized voice, which will be used for subsequent translation; see also p. 0066 & 0070).	As per claim 46, McFarland in view of Waibel discloses:	The system of claim 45, wherein the system is further configured to: display the transcribed text in the first human language during the playing of the recorded audio session (McFarland; p. 0037 - the voice is converted to text by using at least one lexicon adapted to the professor's speech, punctuation and formatting logic is applied to the transcribed speech and broadcast to students. A court reporter/computer operator is given the opportunity to edit the transcription; see also p. 0040 - The basic functionality of the system 10 is a voice recognition transcription system that displays text in a user-friendly interface; see also p. 0054 - The system 10 can transcribe multiple voices even when spoken concurrently at different microphones 15 and identify each speaker separately as the voices are buffered within the computer 25, Multiple channels may be used for this feature. Another option can be to select that all participants are translated and displayed on the screen with a space between each participant when more then one speak at the same time. When one participant stops speaking, the blank space between speakers automatically disappears. The text is in different colors for each speaker making it immediately apparent who is speaking; see also p. 0055 & 0071); and accept transcribed-text corrective inputs to the displayed transcribed text from the user of each of the one or more client device during the playing of the recorded audio session (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0066 - the system accepts operator input of corrections to the translation. At step 627, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070); and receive the transcribed-text corrective inputs from one or more client devices during the playing of the audio session (McFarland; p. 0034 - A court reporter/computer operator is given the opportunity to edit the transcription at step 811, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0066 - the system accepts operator input of corrections to the translation. At step 627, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070); update the automatic speech recognition module based on the received transcribed-text corrected inputs during the playing of the recorded audio session, such that the automatic speech recognition module uses the transcribed-text corrective inputs in recognizing the audible output by the speaker during the playing of the recorded audio session (McFarland; p. 0037 - A court reporter/computer operator is given the opportunity to edit the transcription, if edits are received, the system 10 saves the corrections to a rules file and the voice engine will use the corrections for future translations; see also p. 0053 - The system 10 can transcribe dialogue in hearings, depositions, trials, and a plurality of other dialogue settings. During transcription, the system 10 accepts corrections of any unrecognized voice patterns in real time transmitted to it by a court reporter/computer operator. Once a particular pattern has been corrected in this manner, the software will automatically correctly transcribe the pattern for all subsequent occurrences; see also p. 0056 - The system allows the user to see the text, and edit the same in real time and the user is able to define unrecognized voice, which will be used for subsequent translation; see also p. 0066 - the system accepts operator input of corrections to the translation, responsive to the operator inputted corrections, the system updates the selected lexicon with the new definition; see also p. 0070); and upon receiving a transcribed-text corrective input that is applicable to a portion of the transcribed text, re-translate the portion of the transcribed text to the second human language such that the user interfaces of the one or more client devices display the re-translated portion in the second human language (McFarland; p. 0070 - While real time is taking place, a court reporter/computer operator, can make corrections and define incorrect voice translations and have those corrections apply to all future translations. The corrections only apply to the profile/dictionary that was opened at that particular point in the transcript. As corrections are made and parentheticals (unspoken text) are inserted by the court reporter/computer operator, the system 10 can refresh each connected computer accordingly. The system 10 has a list of all parentheticals, which can be selected for automatic insertion in the transcript).
	As per claims 36 and 51, McFarland in view of Waibel discloses:	The system and method of claims 35 and 50, upon which claims 36 and 51 depend. 	And further, Waibel teaches wherein, in the presentation mode, the user interface simultaneously displays the text in the first human language and the translated text in the second human language (Waibel; Fig. 18 illustrates the display of the text in the first and second language simultaneously; p. 0056 - a simultaneous translation mode will be present. In this mode no button push is required but the system continuously recognizes and translates all speech present on both microphone inputs. Continuously recognition and simultaneous translation is shown). 	Therefore, it would have been obvious to one of ordinary skill in the art to modify the system and method of McFarland to include wherein, in the presentation mode, the user interface simultaneously displays the text in the first human language and the translated text in the second human language, as taught by Waibel, in order to enabling users to add new vocabulary items and improving and modifying the content and usage of their system in the field, without requiring linguistic or technical knowledge or expertise for field maintenance (Waibel; p. 0003). 	As per claim 38, McFarland in view of Waibel discloses:	The system of claim 32, upon which claim 38 depend. 	And further, Waibel does teach wherein the language translation module is configured to, during the audio session: identify a low-confidence word in the translated text in the second human language where the language translation module has a confidence level for the low-confidence word below a threshold confidence level; flag the low-confidence word in the display of the translated text (Waibel; p. 0053 - To help the user determine if the translation output is adequate, the automatically generated translation is translated back into the input language via MT module 3 or 8 and displayed with parentheses under the original input. If the confidence of both speech recognition and translation are high as determined by the ASR model, 2 or 9, and the MT module, 3 or 8, spoken output is generated via loud speakers 5 or 6, via TTS modules 4 or 7. Otherwise, the system indicates that the translation may be wrong via the GUI, audio and/or tactical feedback. The specific TTS module used in step 33 is selected based on the output language); receive a corrective input from a user of one of the one or more client devices for the low-confidence word: and update a model of the language translation module to use the corrective input for the low-confidence word for the audio session (Waibel; p. 0055 - Thereafter, if the user is dissatisfied with the generated translation, the user may intervene during the speech-to-speech translation process in any of steps from 27 to 33 or after process has completed. This invokes the Correction and Repair Module 14 at. The Correction and Repair Module records and logs any corrections the user may make, which can be later used to update ASR modules 2 and 9 and MT modules 3 and 8 as described in detail further below in this document. If the correction contains a new vocabulary item, or if the user enters the field customization mode to explicitly add a new word to the system, or if a new word is automatically detected in the input audio using confidence measures or new word models, the User Field Customization Module (Module 12) is invoked).	Therefore, it would have been obvious to one of ordinary skill in the art to modify the system and method of McFarland to include wherein the language translation module is configured to, during the audio session: identify a low-confidence word in the translated text in the second human language where the language translation module has a confidence level for the low-confidence word below a threshold confidence level; flag the low-confidence word in the display of the translated text; receive a corrective input from a user of one of the one or more client devices for the low- confidence word; and update a model of the language translation module to use the corrective input for the low-confidence word for the audio session, as taught by Waibel, in order to enabling users to add new vocabulary items and improving and modifying the content and usage of their system in the field, without requiring linguistic or technical knowledge or expertise for field maintenance (Waibel; p. 0003).

Claims 52 and 53 are rejected under 35 U.S.C. 103 as being unpatentable over McFarland in view of Waibel and further in view of Leydon (US PG Pub 20110307241).
	As per claims 52 and 53, McFarland in view of Waibel disclose:	The system and method of claims 32 and 47, upon which claims 52 and 53 depend.	McFarland in view of Waibel, however, fail to teach wherein the high-risk word comprises one of vulgar language, insults, sexist language, racist language, hate speech, politically or socially charged concepts and words.
	Leydon does teach wherein the high-risk word comprises one of vulgar language, insults, sexist language, racist language, hate speech, politically or socially charged concepts and words (Leydon; p. 0077 - The profanity module 316 may be configured to identify one or more profane words or phrases (hereafter, referred to as a “profanity” (high-risk word)) in a chat message, and may be further configured to suggest replacement words or phrases (e.g., suitable substitute) corresponding to the profanity (e.g., a toned down euphemism). In some embodiments, the profanity module 316 may flag identified profanity to be skipped or otherwise ignored during a subsequent machine translation (e.g., by the translation module 116). Additionally, in some embodiments, identified profanity may be flagged for later review and disposition by a human operator (e.g., an administrator of the CTT system 114). In order to identify profanity and/or its corresponding word or phrase, some embodiments may utilize a dataset (e.g., stored on a data store) comprising profanity and/or mappings between abbreviations and their corresponding words and phrases. The dataset may be constructed by way of training or a learning system, may be proprietary (e.g., manually collected “in-house” by an administrator of the CTT system 114), may be commercially acquired, or may be derived from a publicly available Internet knowledgebase. The result from the profanity module 316 may comprise profanity flagged by the profanity module 316 to be ignored, a suggested replacement, or a word or phrase inserted into the message by the profanity module 316 (e.g., in place of the identified profanity). Depending on the embodiment, the message that results from the profanity module 316 may be provided to another transformation module (in the transformation module 208) for further processing or the suggested replacement may be provided to the CTT control module 202 to determine if the message transformed by the profanity module 316 is in the data store 210).
	Therefore, it would have been obvious to one of ordinary skill in the art to modify the system and method of McFarland and Waibel to include wherein the high-risk word comprises one of vulgar language, insults, sexist language, racist language, hate speech, politically or socially charged concepts and words, as taught by Leydon, in order to improve translation in a multi-lingual multi-user chat system associated with an online game (e.g., in-game chat system) (Leydon; p. 0007).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The prior art made of record and not relied upon includes:	Nelson (US PG Pub 20190273767) an approach is provided for integrating mobile devices into electronic meetings conducted over computer networks using IWB appliances. The approach includes a user-friendly way for users to join electronic meetings using mobile devices. The approach also allows participants to command and control an electronic meeting using their mobile device, and to receive individualized output, such as meeting transcripts, real-time language translation, messages, prompts, meeting information, and personalized audio streams (Nelson; Abstract).	Lee (US PG Pub 20200258504) an electronic apparatus configured to acquire information on a plurality of candidate texts corresponding to input speech of a user through a general speech recognition module, determine text corresponding to the input speech from among the plurality of candidate texts using a trained personal language model, and output the text as a result of speech recognition of the input speech (Lee; Abstract).
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rodrigo A Chavez whose telephone number is (571)270-0139. The examiner can normally be reached Monday - Friday 9-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 5712727602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RODRIGO A CHAVEZ/Examiner, Art Unit 2658                                                                                                                                                                                                        

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Nov 07, 2022
Application Filed
Jan 03, 2025
Non-Final Rejection mailed — §103
Sep 26, 2025
Response Filed
Apr 08, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/544,008
Patent 12620044
SYSTEMS AND METHODS FOR TRACKING DISASTER FOOTPRINTS WITH SOCIAL STREAMING DATA
4y 5m to grant Granted May 05, 2026
18/175,355
Patent 12597430
MULTI-CHANNEL SIGNAL GENERATOR, AUDIO ENCODER AND RELATED METHODS RELYING ON A MIXING NOISE SIGNAL
3y 1m to grant Granted Apr 07, 2026
17/579,750
Patent 12579984
DATA AUGMENTATION SYSTEM AND METHOD FOR MULTI-MICROPHONE SYSTEMS
4y 1m to grant Granted Mar 17, 2026
17/513,419
Patent 12541653
ENTERPRISE COGNITIVE SOLUTIONS LOCK-IN AVOIDANCE
4y 3m to grant Granted Feb 03, 2026
17/532,315
Patent 12542136
DYNAMICALLY CONFIGURING A WARM WORD BUTTON WITH ASSISTANT COMMANDS
4y 2m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
51%
Grant Probability
90%
With Interview (+38.6%)
3y 3m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 233 resolved cases by this examiner. Grant probability derived from career allowance rate.