Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 112
Applicant’s amendments filed 11/7/25 suffice to obviate the rejection of Claim 9 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-5, 11-22 rejected under 35 U.S.C. 103 as being unpatentable over Zhu: 20210375289 (of record) further in view of Zhang: “SummIt: Iterative Text Summarization via ChatGPT” (of record; copy provided by Applicant; and hereinafter Zha).
Regarding claim 1
Zhu teaches:
A user electronic device comprising:
one or more microphones configured to capture raw audio data (Zhu: ¶ 541-543; Fig 9: plural systems comprising one or more microphones for capturing audio data such as for transcription processing, generation of text thereby, etc.); and
one or more processors and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations (Zhu: ¶ 40; Fig 1: processors operative of memory borne instructions such as to operate a meeting assistant functionality in concert with microphone(s)) comprising:
receiving the raw audio data captured by the one or more microphones (Zhu: ¶ 541-543; Fig 5, 9: microphone(s) record speech of participants for use downstream such as in the form of generated text, etc.);
processing the raw audio data using a speech transcriber to generate a live transcription of the raw audio data that comprises a plurality of text tokens (Zhu: ¶ 72, 82-84, 454, 541-544; Fig 9: such as by generation of speech to text, tagging, labelling, tokenization, etc. in concert with a language model, and other modalities, such as to make the recorded audio pliant to downstream processing such as by summarization, etc.);
processing the raw audio data to generate a speaker identification output that identifies, for each of the text tokens, a respective speaker for each of the text tokens in the live transcription (Zhu: ¶ 72, 79-84, 443, 541-543; Fig 1, 9: extracted audio features, such as a voice profile used to determine identity of a speaker);
generating an input text by modifying the live transcription to insert text identifying the respective speakers for each of the text tokens in the live transcription (Zhu: ¶ 36, 55, 72, 79-84, 443, 541-543; Fig 5, 9: transcription modified to include speaker identification); and
processing the input text generated from the live transcription using a language model neural network to generate a modified transcription (Zhu: ¶ 36, 55, 72, 79-84, 459-465, 468-477, 541-543; Fig 3-5: transcription processed for summarization such as by a transformer model) such as based on an instruction to correct the live transcription and wherein the modified transcription is a corrected transcription that corrects transcription errors in the live transcription (Zhu: ¶ 57, 68, 454, 490-492, 548, etc.; Fig 3: post processing includes instructions to correct errors within the transcription, such as grammatical errors, errors introduced by automatic speech recognition, etc.; such as to make a transcript more readable, more correct, etc.).
Zhu does not explicitly teach the processing of a first input additionally comprising
(i) a first input prompt that comprises an instruction to correct the live transcription and (ii) the input text generated from the live transcription using a language model neural network to generate a modified transcription.
In a related field of endeavor Zha teaches a system and method for curating text transcriptions by prompting a first large language model ((LLM) please see Zha: Abstract) wherein the first model, such as a first LLM module such as in chatgpt, uses an input text stream to refine a summary of an input text (Zha: Abstract; Fig 1, 2) such as by iterative correction of the summary based on a prompt to the first LLM module (Zha: § 3.1; Fig 2); wherein a second LLM module (Zha: § 3.4; Appendix B; Fig 2: such as the evaluator module of the figure) in receipt of the summary generated by the first LLM module returns a first prompt to the first LLM module which comprises an instruction to correct the live transcription (Zha: § 3.4; Appendix B; Fig 2: such as an instruction to revise a summary of an input text by providing a prompt to a first LLM module, said prompt comprising determined corrections of the input text to be executed by the first LLM module) by making determined modification to the initial summarization of the input to the first LLM module (id.). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize an LLM evaluator module such as that taught of suggested by Zha to receive and post process an input transcription such as that taught or suggested by Zhu and to correct errors therein based on utilizing the Zhu taught post processing of errors in concert with the Zha taught evaluator to thereby prompt a first LLM module to improve a transcription for at least the purpose of utilizing an LLM, modules thereof, etc. to provide improved human readable transcriptions and/or summaries by correction of errors therein; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 3
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 2, the operations further comprising: processing a second input comprising a second prompt for a text analysis task and context data comprising the corrected transcription and using the language model neural network to generate a text output for the text analysis task for the corrected transcription (Zha: § 3.4; Fig 2: system utilizes additional prompts to correct, edit, curate, etc. the summary). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 4
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 3, wherein the second prompt comprises an instruction to identify action items for a particular speaker, wherein the action items comprise (1) questions for the speaker to answer, (2) tasks for the speaker to complete (Zhu: ¶ 72, 82-84, 454, 489, 541-544; Fig 5, 9: speakers, roles, etc. identified, tasks ascribed thereto; in the case of the figure 5 illustration the transcriptions apply to product design and the summary one for the industrial designer speaker, role, etc.), both, and the text output comprises text derived from the corrected transcription that identifies one or more action items for the particular speaker (Zhu: ¶ 72, 82-84, 454, 489, 541-544; Fig 5, 9: the summarization represents particular action item for the designer); (Zha: § 3.1, 3.4; fig 1, 2: system automatically generates corrected transcriptions using LLM). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 5
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 3, wherein the second prompt comprises an instruction to summarize the corrected transcription and the text output comprises text that summarizes the corrected transcription (Zha: § 3.1, 3.4; Fig 2: system prompts LLM to summarize input text and utilizes additional prompts to correct, edit, curate, etc. the summary). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 11
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 1, wherein the plurality of text tokens comprise a set of speaker identifiers and a block of text associated with each speaker identifier (Zhu: ¶ 72, 82-84, 454, 489, 541-544; Fig 5, 9: system tokenizes inputs and ascribes same to particular identified speakers, roles thereof). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 12
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 1, wherein the prompt comprises a query to correct the live transcription and one or more additional instructions (Zha: § 3.1, 3.4; fig 1, 2: system automatically corrects text using LLM system performs post processing upon transcription such as to perform grammar, syntactic, readability, etc. correction such as to generate a transcript with corrected errors and improved readability based on iterative series of prompts among an LLM summarization system comprising prompted summarizer and evaluator actions). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 13
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 1, the operations further comprising outputting the modified transcription to a user of the user electronic device (Zhu: Fig 5); (Zha: Fig 2: a refined summary returned). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 14
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 1, wherein the first input prompt further comprises an instruction to identify action items for a particular speaker (Zhu: Fig 5); (Zha: Fig 2; Table 7; Evaluation-Iter1; Appendix A: a refined summary returned based on rephrasing instructions). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to rephrase a summary of Zhu such as using the Zha LLM text summary improvement system to curate summaries specifically for providing directions to a designer, manager, etc. the rephrasing in the form of deliverables, such as to reify requirements of said deliverables in terms of “must, should, may” and for at least the purpose of generating concise directions resultant from a meeting and based on based on key points of said meeting; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 15
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 1, wherein the first input prompt further comprises an instruction to summarize the corrected transcript (Zha: Abstract; § 3.1, 3.4 Fig 2; Table 7; Evaluation-Iter1; Appendix A: a refined summary iteratively returned to arrive at a best corrected, most improved, etc. state of a summary). The claim is considered obvious over Zhu as modified by Zha as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu and/or Zha to the modified device of Zhu and Zha; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claims 16, 19—the claims are considered to recite substantially similar subject matter to that of claim 1 and are similarly rejected.
Regarding claims 17—the claim is considered to recite substantially similar subject matter to that of claim 3 and is similarly rejected.
Regarding claims 18—the claim is considered to recite substantially similar subject matter to that of claim 4 and is similarly rejected.
Regarding claim 20-22
Zhu in view of Zha teaches or suggests:
The user electronic device of claims 1, 16, 19 wherein the speech transcriber and the language model neural network both run on the user electronic device (Zhu: ¶ 39, 451, 544; system 110 comprises modules operable to transcribe, post process and correct data with respect to input content in concert with distributed services such as a speech service and wherein embodiments include the modules, services, etc. embodied upon a speech enabled smart device); (Zha: § 3.4; Appendix B; Fig 1, 2: a large language model operable of plural subordinate modules to process input textual data and to correct same). While Zhu in view of Zha does not explicitly discuss the modules of the device embodied upon a singular user device Examiner considers such an embodiment obvious to try as at before the effective filing date of the instant invention there existed the recognized problem of providing functionality of a system at a server, at a user device, or in a distributed manner at a combination of the two; as such there existed a finite number of identified, predictable potential solutions to such an implementation; further one or ordinary skill in the art at the relevant time could be relied upon not only to recognize the utility of operating discrete functional modules in keeping with the finite solutions, but also to pursue potential solutions in this regard and with a reasonable expectation of success, predictable results, etc. and without undue experimentation; as such it would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to locate transcription module(s); correction module(s); LLM implementation(s) thereof, etc. at a user device or device local to a user; one of ordinary skill in the art would have expected only predictable results therefrom.
Claims 7-10 rejected under 35 U.S.C. 103 as being unpatentable over Zhu: 20210375289 further in view of Zhang: “SummIt: Iterative Text Summarization via ChatGPT” (copy provided by Applicant and hereinafter Zha) as applied to claims 1, 3-5 supra and further in view of Zhu: 20240340193 hereinafter Zhu2.
Regarding claim 7
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 3, wherein the large language model provides a summary of the audio data but does not explicitly teach the provision at predetermined time intervals.
In a related field of endeavor Zhu2 teaches a real-time summarization system wherein the system provides periodic summary updates over determined intervals wherein transcripts, portions thereof are regularly processed over determined elapsed time intervals (Zhu2: ¶ 21, 41, 44, 49-52, 54, 62, 68, etc.: system determines various triggers for summarization including based on determined durations of time). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to combine the real-time system discussed by Zhu2 with the Zhu in view of Zha system and method for at least the purpose of providing regularly update of transcripts, portions thereof over determined intervals; One of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 8
Zhu in view of Zha teaches or suggests:
The user electronic device of claim 7, the operations further comprising:
determining that a specified time interval has elapsed since a prior live transcription of raw audio has been processed using the language model neural network (Zhu2: ¶ 21, 41, 44, 49-52, 54, 62, 68, etc.: system determines various triggers for summarization including based on determined durations of time); and
processing the first input using the language model neural network in response to determining that the specified time interval has elapsed, wherein the live transcription is a transcription of raw audio (Zhu: ¶ 72, 82-84, 454, 541-544; Fig 9: system processes live speech input(s), such as by generation of speech to text, tagging, labelling, tokenization, etc. in concert with a language model, and other modalities, such as to make the recorded audio pliant to downstream processing such as by summarization, etc.); (Zha: § 3.1, 3.4; fig 1, 2: system performs post processing upon transcription such as to perform grammar, syntactic, readability, etc. correction such as to generate a transcript with corrected errors and improved readability); captured during the specified time interval (Zhu2: ¶ 21, 41, 44, 49-52, 54, 62, 68, etc.: system determines various triggers for summarization including based on determined durations of time). The claim is considered obvious over Zhu as modified by Zha and Zhu2 as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu, Zha, and/or Zhu2 to the modified device of Zhu, Zha, and Zhu2; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 9
Zhu in view of Zha in view of Zhu2 teaches or suggests:
The user electronic device of claim [ 3 ] 8, wherein either the first input prompt or the second prompt or both include comprise an instruction to correct transcriptions generated from earlier live transcriptions of raw audio before the specified time interval (Zhu: ¶ 72, 82-84, 454, 541-544; Fig 9: system processes live speech input(s), such as by generation of speech to text, tagging, labelling, tokenization, etc. in concert with a language model, and other modalities, such as to make the recorded audio pliant to downstream processing such as by summarization, etc.); (Zha: § 3.1, 3.4; fig 1, 2: system automatically corrects text using LLM system performs post processing upon transcription such as to perform grammar, syntactic, readability, etc. correction such as to generate a transcript with corrected errors and improved readability); (Zhu2: ¶ 21, 41, 44, 49-52, 54, 62, 68, etc.: system determines various triggers for summarization including based on determined durations of time). The claim is considered obvious over Zhu as modified by Zha and Zhu2 as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu, Zha, and/or Zhu2 to the modified device of Zhu, Zha, and Zhu2; one of ordinary skill in the art would have expected only predictable results therefrom.
Regarding claim 10
Zhu in view of Zha in view of Zhu2 teaches or suggests:
The user electronic device of claim 8, the operations further comprising: determining that transcribing has terminated; and, in response to determining that transcribing has terminated, processing a final input to generate text that summarizes the live transcription of raw audio data captured prior to, during, and after the specified time interval (Zhu: ¶ 72, 82-84, 454, 541-544; Fig 9: system processes live speech input(s), such as by generation of speech to text, tagging, labelling, tokenization, etc. in concert with a language model, and other modalities, such as to make the recorded audio pliant to downstream processing such as by summarization, etc.); (Zha: § 3.1, 3.4; fig 1, 2: system automatically corrects text using LLM system performs post processing upon transcription such as to perform grammar, syntactic, readability, etc. correction such as to generate a transcript with corrected errors and improved readability which terminates upon a determined level or improvement); (Zhu2: ¶ 21, 41, 44, 49-52, 54, 62, 68, etc.: system determines various triggers for summarization including based on determined durations of time and provides summaries when determined intervals have terminated). The claim is considered obvious over Zhu as modified by Zha and Zhu2 as addressed in the base claim as it would have been obvious to apply the further teaching of Zhu, Zha, and/or Zhu2 to the modified device of Zhu, Zha, and Zhu2; one of ordinary skill in the art would have expected only predictable results therefrom.
Response to Arguments
Applicant's arguments filed 11/7/25 have been fully considered but they are not persuasive. Examiner has modified the rejection to independent claim 1 supra in response to Applicant’s amendments thereto—Examiner has not introduced any new grounds but has edited the rejection to address Applicant’s arguments. Examiner must point out that rather than providing evidence of the manner in which the instant invention features over the prior art combination Applicant has chosen to argue the references piecemeal, Examiner is rarely persuaded by such arguings. Specifically Applicant argues that “Zhu cannot teach or suggest "processing a first input comprising (i) a first input prompt that comprises an instruction to correct the live transcription and (ii) the input text generated from the live transcription using a language model neural network to generate a modified transcription, wherein the modified transcription is a corrected transcription that corrects transcription errors in the live transcription “as recited by the amended claim.”” Examiner respectfully disagrees, the teaches of Zhu have been addressed supra but essentially Zhu teaches correcting a transcription including errors of grammar, recognition and clarity (Zhu: ¶ 57, 68, 454, 490-492, 548, etc.; Fig 3: post processing includes instructions to correct errors within the transcription, such as grammatical errors, errors introduced by automatic speech recognition, etc.; such as to make a transcript more readable, more correct, etc.). Examiner agrees that Zhu lacks discussion of prompting an LLM in this regard and as such the rejection is based on Zhu in view of Zhang. Examiner additionally agrees that Zhang teaches or “describes refining a "generated summary iteratively through self-evaluation and feedback" (Zhang at Abstract) using an "iterative text summarization framework based on large language models like ChatGPT" (Zhang at Abstract) and a language model that "generates a summary y autoregressively conditioning on [an] input source document x" (Zhang at Section 3.1).” It is additionally true that Zhang uses “"large language models like ChatGPT" to summarize documents,” and that Zhang does not teach using an LLM type “model to correct transcription errors.”
What is incorrect is Applicant’s assertion that “Zhang only teaches or suggests using "large language models like ChatGPT" to summarize documents.” In fact Zhang teaches a first LLM module to summarize and a second LLM module to correct and revise the input provided by issuing prompts to the first LLM module to act upon or otherwise edit the text generated thereby, said prompt comprising instructions to add, remove, or otherwise parse the text (Zha: § 3.4; Appendix B; Fig 2). Reliant on this assertion Applicant states conclusively that “Zhang cannot teach or suggest "processing a first input comprising (i) a first input prompt that comprises an instruction to correct the live transcription and (ii) the input text generated from the live transcription using a language model neural network to generate a modified transcription, wherein the modified transcription is a corrected transcription that corrects transcription errors in the live transcription" as recited by the amended claim.”
Examiner respectfully disagrees. Zhang teaches processing a first input in the form of the input to a first LLM module, wherein the first input is text, which a second LLM module corrects and curates by providing a prompt to the first LLM module (Zha: § 3.4; Appendix B; Fig 2: a second evaluator module provides a prompt instruction to revise a summary of an input text by providing the prompt instruction to a first LLM module, said prompt comprising determined corrections of the input text to be executed by the first LLM module). Zhu teaches utilizing a neural network to correct a transcription of an input text by post processing (Zhu: ¶ 57, 68, 454, 490-492, 548, etc.; Fig 3: post processing includes instructions to correct errors within the transcription, such as grammatical errors, errors introduced by automatic speech recognition, etc.; such as to make a transcript more readable, more correct, etc.). Zhu in view of Zhang teaches or suggests utilizing an LLM evaluator module such as that taught of suggested by Zha to receive and post process an input transcription, text thereof, such as that taught or suggested by Zhu and to correct errors therein based on utilizing the Zhu taught post processing of errors in concert with the Zha taught evaluator to thereby prompt a first LLM module to improve a transcription for at least the purpose of utilizing an LLM, modules thereof, etc. to provide improved human readable transcriptions and/or summaries by correction of errors therein. As such Applicant’s arguments are not considered persuasive and claims 1, 16, 19 are not considered allowable. Applicant’s arguments regarding the dependent claims depend materially from the arguments regarding claim 1 and are similarly not persuasive.
Conclusion
Applicant's amendment necessitated any new ground(s) of rejection or modification(s) to the rejection in response and as presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CAROLYN EDWARDS can be reached at (571) 270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/PAUL C MCCORD/Primary Examiner, Art Unit 2692