DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Non-final Office Action from 10/27/2025, Applicant has filed an amendment on 1/27/2026. In this reply, Applicant has not elected to file amendments to the independent claims to overcome the prior art of record and advance prosecution. Instead, Applicant has argued that the prior art of record fails to teach the limitations regarding using an LLM to determine whether spoken words are intended for the automated agent and the prior art of record is directed towards a different concept of the invention that relates to whether spoken language in the ambient environment is directed to an automated agent or constitutes ambient conversation not intended for the agent (Remarks, Pages 10-12). These arguments have been fully considered, however, are not found to be persuasive for the reasons noted in the below Response to Arguments section. Note that the rejections of 3-4 and 12-13 have been restructured to rely upon the same references/teachings and thrust of rejection in response to amended change in dependency.
In response to the amended, more specific title of the invention (Remarks, Page 9), the objection directed towards a non-descriptive title is now moot and has been withdrawn.
Applicant argues that changing the dependency of claims 3, 4, 12, and 13 has resolved the antecedent basis issues under 35 U.S.C. 112(b) (Remarks, Page 9-10).
In response, the 35 U.S.C. 112(b) rejections of these claims are moot and have been withdrawn. The Applicant, however, has neither argued or filed amendments addressing the various indefiniteness issues in claims 5, 7, 8, 9, 14, 16, 17, and 18. Thus, these 35 U.S.C. 112(b) rejections have been maintained.
Response to Arguments
In response to the rejection of independent claim 1 under 35 U.S.C. 102(a)(1) as being anticipated by Baeuml, et al. (U.S. PG Publication: 2023/0074406A1), Applicant argues that Baeuml has been mischaracterized in the rejection and specifically fails to teach "generating a structured prompt that includes at least the converted text and an instruction for a Large Language Model (LLM), wherein the instruction is configured to request the LLM to infer whether the spoken words are intended for the automated agent." In particular, Applicant argues that "Baeuml addresses an entirely different technical problem than that solved by Applicant's claimed invention" because Baeuml improves the conversational quality of automated assistant responses after the automated assistant has already been invoked and after it has "already been determined that spoken utterances are directed to the assistant." Applicant further contends that "Baeuml's entire technical framework presumes that spoken language has already been identified as being directed to the automated assistant before any LLM processing occurs" and that at "no point does Baeuml describe, suggest, or contemplate using an LLM to determine whether spoken words are intended for the automated agent in the first instance." Lastly, Applicant looks to the provided citations of Baeuml and argues that the finding that processing recognized text along with additional data to determine whether the transcription relates to an automated assistant is fundamentally flawed because it is alleged that the position in the action conflates determining whether speech is directed towards an agent and determining what intent or action is expressed within speech already known to be directed to an agent (Remarks, Pages 10-12).
In response, Applicant is reminded that during patent examination, the pending claims must be "given their broadest reasonable interpretation consistent with the specification." The Federal Circuit’s en banc decision in Phillips v. AWH Corp., 415 F.3d 1303, 1316, 75 USPQ2d 1321, 1329 (Fed. Cir. 2005). Applicant takes a position that Baeuml does not teach "generating a structured prompt that includes at least the converted text and an instruction for a Large Language Model (LLM), wherein the instruction is configured to request the LLM to infer whether the spoken words are intended for the automated agent" relates to determining "whether spoken language in the ambient environment is directed to the automated agent or constitutes ambient conversation not intended for the agent."
The claim limitation in question reads: "generating a structured prompt that includes at least the converted text and an instruction for a Large Language Model (LLM), wherein the instruction is configured to request the LLM to infer whether the spoken words are intended for the automated agent." Importantly, note that the claimed invention does not match Applicant's characterization in that it does not explain the structuring of the generated prompt to detect "ambient conversation not intended for the agent." The claimed invention also only recites that the prompt instruction only broadly indicates the request for inferring "whether the spoken words are intended for the agent." Importantly, the claim does not specify how or in what sense an input is intended for an agent. While the Applicant's arguments characterize the detection of ambient background conversation, these specific details are lacking from the claimed prompt instruction. Instead, the claimed invention only requires that the instruction broadly makes requests that some determination of whether the spoken words are intended for the automated agent be made by the LLM. Under the BRI, the actual claim language is addressed by the teachings of Baeuml contrary to the Applicant's arguments that are based upon a narrower claim construction than required given the instant claim 1.
Specifically, Baeuml discloses a "structured prompt" including context of the past/prior dialog session that allows an LLM to infer that the monitored speech input is intended for the LLM (Paragraphs 0059-0060). For example, by identifying a prior context, the LLM identifies that the spoken input pertains to the automated assistant since that speech input is in the context of a conversation that has been ongoing with that assistant and thus is pertinent. As another example, the LLM infers the intent of the utterance to determine if the input relates/pertains to something (action/response) that the assistant/agent can assist the user in solving (see the food example in Paragraph 0060). Furthermore, Applicant is also directed to consider Paragraph 0063 in which an LLM is instructed to generate parameters associated with certain agents in determining that a spoken input relates to one those agents (e.g., a chef agent is identified as relating to a user indicating they are hungry). Thus, Baeuml features a multitude of considerations via prompting as to whether "the spoken words are intended for the automated agent"- by past dialog context, by the input relating to a specific agent, or whether the input pertains to a responsive action. Given the above claim interpretation under the BRI and that the Applicant's arguments are supported by features that are not clearly brought forth into the claimed invention, Applicant's arguments are not found to be persuasive.
Applicant is also reminded that additional prior art does teach the unclaimed, argued subject matter. For example, as previously cited in the conclusion of the Non-Final Action (page 12) where Xiao, et al. (U.S. PG Publication: 2025/0078812 A1) was noted to teach hotword free automated assistant invocation using a "continued conversation engine" that predicts whether audio is directed towards an automated assistant or towards another user (Paragraph 0075).
The prior art rejections of the remaining independent and dependent claims have been traversed for reasons similar to Claim 1 (Remarks, Pages 12-13). In regards to such arguments, see the response directed towards claim 1.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 5, 7-9, 14, and 16-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
In Claim 5, Line 4, "an action" lacks a referential modifier/definite article (e.g., said or the) so it is unclear whether a new action is being introduced or if the action should refer back to the term as originally introduced in claim 1. For the purposes of claim interpretation in the interest of compact prosecution, "an action" will be interpreted as --the action--. Claim 14 features similar indefinite claim language and has likewise been rejected under 35 U.S.C. 112(b).
In Claim 7, Line 2, "the end of a spoken sentence" lacks antecedent basis and it is unclear what limitation is being referenced. For claim interpretation, "the end" will be interpreted as --an end--. Claim 16 features similar indefinite claim language and has likewise been rejected under 35 U.S.C. 112(b).
In Claim 8, Line 2, "a standardized format" is recited and lacks a referential modifier/definite article (e.g., said or the) so it is unclear whether a new action is being introduced or if the action should refer back to the term as originally introduced in claim 1. For the purposes of claim interpretation in the interest of compact prosecution, "a standardized format" will be construed as --the standardized format--. Claim 17 features similar indefinite claim language and has likewise been rejected under 35 U.S.C. 112(b).
The remaining dependent claims fail to resolve and inherit the indefinite claim language of their parent claims, and thus, have also been rejected under 35 U.S.C. 112(b) by virtue of their dependency.
Regarding Claims 9 and 18, the phrase "such as" renders the claim indefinite because it is unclear whether the limitations following the phrase are part of the claimed invention. See MPEP § 2173.05(d). For claim interpretation these limitations will be interpreted as being optional.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1, 6, 9-10, 15, and 18-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Baeuml, et al. (U.S. PG Publication: 2023/0074406 A1).
With respect to Claim 1, Baeuml discloses:
A system for processing spoken language to determine user intent for interaction with an automated agent, the system comprising:
at least one processor (“one or more processors,” Paragraph 0040);
at least one memory component storing instructions that, when executed by the at least one processor, cause the at least one processor to perform operations comprising (“one or more memories,” Paragraph 0040):
continuously monitoring ambient audio via a microphone integrated with a device ("monitor a stream of audio data generated by one or more microphones of the client device," Paragraph 0068; see also continuous monitoring at the loop in Element 362 of Fig. 3);
converting captured spoken words from the ambient audio into text using a speech-to-text conversion process ("system can process the stream of audio data (e.g., the stream of audio data 201 of FIG. 2) using the ASR engine" to "produce recognized text that corresponds to the spoken utterance" and "monitor for one or more particular words or phrases included in the stream of audio data" Paragraphs 0045 and 0068-0069);
generating a structured prompt that includes at least the converted text and an instruction for a Large Language Model (LLM), wherein the instruction is configured to request the LLM to infer whether the spoken words are intended for the automated agent (recognized text is processed by an LLM along with additional data (e.g., the context of the dialog) to constitute a “structured prompt” to determine whether the transcription relates to an “automated assistant”, Paragraphs 0053, 0056 (describing LLM prompting in the form of "textual input corresponding to the assistant query"), 0059, 0060, 0063, and 0089);
transmitting the structured prompt to the LLM (transmission of the LLM input to an LLM remotely located from the client device, Paragraphs 0022, 0030, and 0059);
receiving, from the LLM, a structured output in a standardized format, wherein the structured output includes an inference result indicating whether the spoken words are intended for the automated agent and, if so, identifying the intent of the user ("LLMs can determine an intent associated with the given assistant query ( e.g., based on the stream of NLU output 204 generated using the NLU engine)" wherein intents detected pertain to voice assistant queries, LLM inference occurs after the target words that are being monitored are detected, Paragraphs 0053, 0059 (discussing example structuring of the LLM output), 0060-0061, 0073, and 0100); and
executing an action by the automated agent based on the identified intent of the user when the inference result indicates that the spoken words are intended for the automated agent (“automated assistant” is caused to take an action such as generating an output based upon the LLM processing including intent identification, Paragraphs 0001, 0035, 0060, 0073, and 0088).
With respect to Claim 6, Baeuml further discloses:
The system of claim 1, wherein the automated agent is integrated into an augmented reality (AR) device (client device in the form of a "augmented reality computing device," Paragraph 0031), and the action executed by the automated agent includes displaying relevant information via an application executing within an AR environment of the AR device ("assistant outputs 207...to be visually rendered by a display of the client device," Paragraph 0051).
With respect to Claim 9, Baeuml further discloses:
The system of claim 1, wherein the automated agent is integrated into an automobile’s infotainment system, and the action executed by the automated agent includes receiving spoken commands related to vehicle control functions, such as adjusting climate settings, setting navigation destinations, or activating windshield wipers, and wherein the LLM is fine-tuned to recognize and process commands specific to automotive operations (client device is part of an in-vehicle system such that inputs causing actions are related to operation of an "in-vehicle navigation system" or "in-vehicle entertainment system," Paragraphs 0031, 0035, and 0066).
Claim 10 recites a corresponding to the functionality performed by the system of claim 1, and thus, is rejected under similar rationale.
Claim 15 contains subject matter similar to Claim 6, and thus, is rejected under similar rationale.
Claim 18 contains subject matter similar to Claim 9, and thus, is rejected under similar rationale.
Claim 19 is directed towards an embodiment of the functionality performed by the system of claim 1 realized as a non-transitory computer-readable storage medium storing processor executable instructions for carrying out that functionality, and thus is rejected under similar rationale. Moreover, claim 19 includes some narrower subject matter related to "domain-specific" automated agents and the LLM being fine-tuned for prompts related to a domain of the domain-specific automated agent. These limitations are addressed by Baeuml (see- automated assistant with trained/re-trained LLMs pertaining to different domains such as restaurant suggestion or weather, Paragraphs 0060, 0066, and 0100). Lastly, Baeuml teaches method implementation as a non-transitory computer-readable storage medium storing program instruction (Paragraph 0136).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2, 11, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Baeuml, et al. in view of Bose, et al. (U.S. PG Publication: 2024/0379096 A1).
With respect to Claim 2, Baeuml discloses the system for monitoring audio for specific words for prompting an LLM to generate a voice assistant reply as applied to Claim 1. Baeuml also discloses that the automated agent is a domain-specific automated agent (automated assistant pertaining to different domains such as restaurant suggestion or weather, Paragraphs 0060 and 0100). Baeuml, however, does not teach LLM prompting related to multi/few shot training as set forth in claim 2. Bose, however, discloses:
providing a system prompt distinct from the structured prompt as input to the LLM, the system prompt providing multi-shot fine-tuning examples to the LLM for a domain of the domain-specific automated agent (LLM is for a voice user assistant, Paragraph 0059; where domain specific utterance examples are provided by Baeuml as noted above), each example comprising a sample of spoken words and a corresponding structured output that indicates either a specific intent or absence of intent ("few-shot examples are pairs of an utterance with its corresponding intent derived from known data" to fine tune the LLM, Paragraphs 0015-0016, 0020, 0022, 0027-0028, and 0035 (discussing a task domain)).
Baeuml and Bose are analogous art because they are from a similar field of endeavor in voice assistants using LLMs. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the few-shot intent examples taught by Bose in the prompting of Baeuml to provide a predictable result of dynamically guiding the model at inference to generalize on unseen data using a few labeled examples (Bose, Paragraph 0016).
Claims 11 and 20 contain subject matter similar to Claim 2, and thus, are rejected under similar rationale.
Claims 3-4 and 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Baeuml, et al. in view of Walters, et al. (U.S. PG Publication: 2025/0232766 A1).
With respect to Claim 3, Baeuml discloses the system for monitoring audio for specific words for prompting an LLM to generate a voice assistant reply as applied to Claim 1. Baeuml further discloses:
providing a system prompt distinct from the structured prompt as input to the LLM, the system prompt including an instruction for the LLM to analyze a stream of text converted from spoken words to determine if the spoken words within the stream are directed towards the domain-specific automated agent or constitute ambient conversation, to extract the intent of the user from the spoken words intended for the domain-specific automated agent (prompting an LLM using different sets of parameters for a domain (e.g., restaurants) and the NLU result for "contextual scenarios" pulling from different data structures to identify related intents (e.g., user profiles/preferences), Paragraphs 0060-0063 and 0068-0069).
Baeuml fails to teach the generation of a response in valid JavaScript Object Notation (JSON) format that indicates the extracted intent of the user without answering any questions posed within the stream of text. Walters, however, discloses:
generate a response in valid JavaScript Object Notation (JSON) format that indicates the extracted intent of the user without answering any questions posed within the stream of text (LLM generation in a "format...such as JSON" responsive to a prompt that includes an indication of intent prior to rendering via interactive voice response, Paragraphs 0022 and 0029-0030; Fig. 1, LLM 106 prior to IVR 120).
Baeuml and Walters are analogous art because they are from a similar field of endeavor in voice assistants utilizing LLMs. Thus, it would have been obvious to one of ordinary skill before the effective filing date to use the JSON format for LLM outputs as taught by Walters in the LLM output generation including intent taught by Baeuml to provide a predictable result of implementing an output format that can readily be ingested by other processing modules (Walters, Paragraph 0017).
With respect to Claim 4, Walters further discloses:
The system of claim 3, wherein the structured output is in JSON format, and the structured output includes a field for the identified intent of the user that is populated when the inference result is positive (LLM generation in a "format...such as JSON” that includes an indication of intent that is positively detected, Paragraphs 0029-0030).
Claims 12-13 contain subject matter respectively similar to Claim 3-4, and thus, are rejected under similar rationale.
Claims 5, 8, 14, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Baeuml, et al. in view of Pandita, et al. (U.S. PG Publication: 2024/0143289 A1).
With respect to Claim 5, Baeuml discloses the system for monitoring audio for specific words for prompting an LLM to generate an automated agent reply as applied to Claim 1. Baeuml does not teach that contextual information is relied upon from previous interactions is used in an instruction for the LLM to correct errors and receive a corrected text from the LLM before executing an action. Pandita, however, disclose (previous context information included with a prompt to an LLM to correct speech-to-text errors where the LM returns the corrected transcription that is corrected prior to executing an action (Paragraphs 0073-0080).
Baeuml and Pandita are analogous art because they are from a similar field of endeavor in speech interfaces using LLMs. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the prompting related to speech-to-text correction based upon context information taught by Pandita with the LLM prompts for voice assistant prompting taught by Baeuml to provide a predictable result of making better predictions by an LLM by resolving speech-to-text errors (Pandita, Paragraph 0079).
With respect to Claim 8, Baeuml (in combination with Pandita) further discloses:
The system of claim 1, wherein the LLM is configured to utilize a function calling capability that ensures the structured output is provided in a standardized format, the function calling capability enabling the LLM to execute predefined functions within the structured prompt that correspond to specific tasks including correction of transcription errors resulting from the conversion of the captured spoken words from the ambient audio into text using the speech-to-text conversion process, and the inference of user intent from the text corresponding with the captured spoken words (the LLM calls various functions to perform specific tasks including the generation of intent based upon transcribed speech, the consideration of context information and other data such as probability information, Paragraphs 0045 and 0059-0060 wherein the added functionality of transcription error correction is taught by Pandita as applied to Claim 5).
Claim 14 contains subject matter similar to Claim 5, and thus, is rejected under similar rationale.
Claim 17 contains subject matter similar to Claim 8, and thus, is rejected under similar rationale.
Claims 7 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Baeuml, et al. in view of Bonar, et al. (U.S. PG Publication: 2024/0169974 A1).
With respect to Claim 7, Baeuml discloses the system for monitoring audio for specific words for prompting an LLM to generate a voice assistant reply as applied to Claim 1. Baeuml does not teach pause detection that identifies the end of a spoken sentence and triggers transmission of the structured prompt to the LLM. Bonar, however, discloses that speech from a user is continually gathered until a "user pauses," which triggers transcription and LLM prompting (Paragraph 0020; Fig. 1, Element 112 (showing audio input transcription in the form of a sentence prior to prompting 116 that is triggered by the pause detection).
Baeuml and Bonar are analogous art because they are from a similar field of endeavor in voice assistants using LLMs. Thus, it would have been obvious to one of ordinary skill in the art, before the effective filing date, to utilize the pause detection taught by Bonar in the LLM prompting taught by Baeuml to provide a predictable result of better ensuring that all user audio that completes a thought is gathered prior to prompting the LLM.
Claim 16 contains subject matter similar to Claim 7, and thus, is rejected under similar rationale.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAMES S WOZNIAK whose telephone number is (571)272-7632. The examiner can normally be reached 7-3, off alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant may use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
JAMES S. WOZNIAK
Primary Examiner
Art Unit 2655
/JAMES S WOZNIAK/ Primary Examiner, Art Unit 2655