DETAILED ACTION
Introduction
1. This office action is in response to Applicant's submission filed on 05/30/2024. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-20 are currently pending and examined below.
Drawings
2. The drawings filed on 05/30/2024 have been accepted and considered by the Examiner.
Information Disclosure Statement
3. The Information Statement (IDS) filed on 05/30/2024 has been accepted/considered and is in compliance with the provisions of 37 CFR 1.97.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
4. Claims 1-20 are rejected under 35 U.S.C. 101 as being nothing more than an abstract idea. As an example, regarding claim 2, the limitations of receiving a first natural language text representing a first user utterance; receiving a past natural language text representing one or more past user utterances belonging to a task; generating a classifier prompt, wherein the classifier prompt comprises the first natural language text, the past natural language text, a classifier instruction template, and one or more classifier examples; receiving a classifier result indicating whether the first natural language text builds upon the task associated with the past natural language text; generating a rephraser prompt based on the classifier result, wherein the rephraser prompt comprises the first natural language text, a past text input, a rephraser instruction template, and one or more rephraser examples and receiving a rephrased natural language text in response to the rephraser prompt, wherein the rephrased natural language text combines the first natural language text and the past text input and all activities that a human being can accomplish using their mind and at most pen and paper. Hence, all these steps fall under the category of mental processes. These steps are drafted at a high level of generality without tying it to a specific technological improvement and the large language model (LLM) recited herein can be a general purpose LLM. Accordingly, this claim recites an abstract idea.
This judicial exception is not integrated into a practical application because the
recitation of a device, a system, processor, a computer readable medium and/or a general purpose LLM merely read to generalized computer components, based upon the claim interpretation wherein the structure is interpreted using the specification. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using generalized computer components to generate, extract, determine, and generate, amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is therefore not patent eligible.
Claims 2-16, merely provide certain details of the mental processes outlined above, such as storing the rephrased text, creating a resolver prompt containing a second text, ascertaining if the second text contains a question, determining user intent and tracked entities etc. These are all steps which themselves can also be accomplished by a human being with at most the aid of a pen and paper and hence also do not amount to significantly more than the judicial exception. Claims 17-18 are computer readable medium (CRM) claims corresponding to method claims 1-16 and therefore are also rejected under 35 U.S.C. 101 for at least the reasons outlined above. Similarly, claims 19-20 are system claims corresponding to the method claims 1-16 and therefore are also rejected under 35 U.S.C. 101 for at least the reasons outlined above.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(2) The claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
5. Claims 1-3, 17-18 and 19-20 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Sharifi (U.S. Patent Application Publication # 2023/0230578 A1).
With regards to claim 1, Sharifi teaches a method, comprising receiving a first natural language text representing a first user utterance (Para 35 and figures 1-4, teach an audio data stream that captures a spoken utterance spoken by a user of a client device);
receiving, from a past dialogue state manager, a past natural language text representing one or more past user utterances belonging to a task (Para 39, teaches a personalized end pointing model that can process the text representation of the spoken utterance or a portion of the text representation of the spoken utterance immediately preceding the candidate endpoint, to generate a personalized end pointing measure);
generating, by a classifier prompt generator, a classifier prompt, wherein the classifier prompt comprises the first natural language text, the past natural language text, a classifier instruction template, and one or more classifier examples (Abstract and para 35, teach generation of candidate endpoints is based on a present user utterance and its textual representation, user-specific measure which in turn in based on the text representation immediately preceding the candidate endpoint and one or more historical interactions with the user. Each of the historical interactions are specific to the text representation and the user and indicate whether a previous instance of the text representation was a previous endpoint for the user);
receiving a classifier result generated by a classifier large language model in response to the classifier prompt, wherein the classifier result indicates whether the first natural language text builds upon the task associated with the past natural language text (Paragraphs 39 and 44, teach use of ASR model in conjunction with an NLU model. Para 56, teaches that each historical interaction can be based on a previous time James spoke “ok umm” immediately preceding a candidate endpoint wherein each historical interaction can include an indication of whether the previous candidate endpoint was a true endpoint. Examples in para 57, show how the system builds upon previous tasks);
generating, by a rephraser prompt generator, a rephraser prompt based on the classifier result, wherein the rephraser prompt comprises the first natural language text, a past text input, a rephraser instruction template, and one or more rephraser examples (Para 73, teaches a dialog manager configured to map a current dialog state, e.g., provided by dialog state tracker, to one or more “responsive actions” of a plurality of candidate responsive actions that are then performed by automated assistant. Responsive actions may come in a variety of forms, depending on the current dialog state. Initial and midstream dialog states that correspond to turns of a dialog session that occur prior to a last turn e.g., when the ultimate user-desired task is performed may be mapped to various responsive actions that include automated assistant outputting additional natural language dialog. This responsive dialog may include requests that the user provide parameters for some action i.e., fill slots that dialog state tracker believes the user intends to perform. Responsive actions may include actions such as “request” e.g., seek parameters for slot filling, “offer” e.g., suggest an action or course of action for the user, “select,” “inform” e.g., provide the user with requested information, “no match” e.g., notify the user that the user's last input is not understood, a command to a peripheral device e.g., to turn off a light bulb etc.);
and receiving a rephrased natural language text generated by a rephraser large language model in response to the rephraser prompt, wherein the rephrased natural language text combines the first natural language text and the past text input (Paragraphs 39 and 44, teach use of ASR model in conjunction with an NLU model. Paragraphs 69, 85 and figure 7, teach causing the client device to render output based on the content or carrying our actions based in the spoken utterance of the user once endpoint is detected. Cloud-based TTS module can also convert textual data e.g., natural language responses formulated by automated assistant, into computer-generated speech output).
With regards to claim 17, this is a CRM claim for the corresponding method claim 1. These two claims are related as method and CRM of using the same, with each claimed CRM element's function corresponding to the claimed method step. Accordingly, claim 17 is similarly rejected under the same rationale as applied above with respect to method claim 1.
With regards to claim 19, this is a system claim for the corresponding method claim 1. These two claims are related as method and system of using the same, with each claimed system element's function corresponding to the claimed method step. Accordingly, claim 19 is similarly rejected under the same rationale as applied above with respect to method claim 1.
With regards to claim 2, Sharifi teaches the method of claim 1, further comprising storing the rephrased natural language text in the past dialogue state manager (Para 68, teaches a cloud-based automated assistant components including a cloud-based TTS module, a cloud-based STT module, a natural language processor, a dialog state tracker and a dialog manager. Cloud-based STT module can convert audio data into text, which may then be provided to natural language processor).
With regards to claim 3, Sharifi teaches the method of claim 1, further comprising generating, by a resolver prompt generator, a resolver prompt, wherein the resolver prompt comprises a second natural language text produced by an automatic speech recognition system, a resolver instruction template, and one or more resolver examples (Para 71, teaches a natural language processor that includes a coreference resolver configured to group, or “cluster,” references to the same entity based on one or more contextual cues. The coreference resolver may be utilized to resolve the term “there” to “Hypothetical Café” in the natural language input “I liked Hypothetical Café last time we ate there.”);
and generating, by a resolver large language model, the first natural language text in response to the resolver prompt (Para 71, further teaches that the natural language processor that is configured to identify and annotate various types of grammatical information in natural language input. The natural language processor may additionally and/or alternatively include an entity tagger configured to annotate entity references in one or more segments such as references to people including, for instance, literary characters, celebrities, public figures, etc., organizations, locations real and imaginary, and so forth. The natural language processor may rely on annotations from one or more other components of the natural language processor. In processing a particular natural language input, one or more components of the natural language processor may use related prior input and/or other related data outside of the particular natural language input to determine one or more annotations).
With regards to claim 18, Sharifi teaches the one or more non-transitory computer-readable media of claim 17, wherein the instructions further cause the one or more processors to process the rephrased natural language text by a natural language understanding system to produce a tracked intent and one or more tracked entities (Para 71, further teaches that the natural language processor that is configured to identify and annotate various types of grammatical information in natural language input. The natural language processor may additionally and/or alternatively include an entity tagger configured to annotate entity references in one or more segments such as references to people including, for instance, literary characters, celebrities, public figures, etc., organizations, locations real and imaginary etc. Para 72, teaches that the dialog state tracker is configured to keep track of a “dialog state” that includes, for instance, a belief state of a one or more users' goals or “intents” over the course of a human-to-computer dialog session and/or across multiple dialog sessions).
With regards to claim 20, Sharifi teaches the system of claim 19, wherein the dialogue state tracking system is further to provide the rephrased natural language text as input to the natural language understanding system (Para 44, teaches transmitting the text representation of the spoken utterance to a natural language understanding or NLU model).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6. Claims 4-16 are rejected under 35 U.S.C. 103 as being unpatentable over Sharifi in view of Mielke (U.S. Patent Application Publication # 2023/0135179 A1).
With regards to claim 4, Sharifi teaches the method of claim 3, wherein generating the resolver prompt comprises obtaining the one or more resolver examples (Paragraphs 30-34, 48-51 and 59, teach assembling and retrieving user specific historical examples and phrase variants from past interactions for personalization and resolution);
inserting the second natural language text (Paragraphs 51-59, teach producing ASR transcripts and using them as input to resolution/NLU pipelines);
Sharifi may not explicitly detail obtaining the resolver instruction template. This is taught by Mielke (Paragraphs 169-176 and figure 5, teach instruction/prompt templates used to structure LLM inputs for resolver/classifier tasks);
Mielke also teaches inserting the one or more resolver examples into the resolver instruction template (Paragraphs 169-176 and figure 12, teach insertion of few shot examples into instruction prompts to guide LLM behavior);
A person of ordinary skill in the art would have been motivated to use Mielke’s well known instruction template/few shot prompt pattern to structure Sharifi’s ASR text and retrieved resolver examples into an LLM resolver prompt to obtain a resolver output as shown above because Mielke shows that such templates reliably elicit task specific behavior from LLMs.
With regards to claim 5, Sharifi teaches the method of claim 3, wherein the resolver instruction template comprises in response to determining that the second natural language text does not include the internal question, outputting the second natural language text as the first natural language text (Paragraphs 51 and 69-76, teach that the pipeline passes ASR/NLU output through when no additional resolution is needed);
Sharifi may not explicitly detail determining whether the second natural language text includes an internal question. This is taught by Mielke (Paragraphs 169-176 and figure 5, shows how to prompt LLMs to perform classification/detection tasks using instruction templates and examples. This supports detection of features such as “internal question”);
Mielke also teaches in response to determining that the second natural language text includes the internal question, updating the second natural language text to include an answer to the internal question and outputting an updated version of the second natural language text as the first natural language text (Paragraphs 172-176 and figure 5, teach a controlled generation pipeline enabling an LLM to produce an updated utterance with an inline answer when prompted);
A person of ordinary skill in the art would have combined these teachings to implement the claimed conditional resolver template because it is a straightforward application of Mielke’s prompting paradigm to Sharifi’s resolver inputs.
With regards to claim 6, Sharifi teaches the method of claim 4, wherein obtaining the one or more resolver examples comprises determining one or more contextual factors associated with one or more of a user who made the first user utterance, a user device, and time (See paragraphs 31-34, 46-52 and 59);
and retrieving the one or more resolver examples using the one or more contextual factors (See paragraphs 31–34 and 48–51).
With regards to claim 7, Sharifi teaches the method of claim 4 wherein obtaining the one or more resolver examples comprises determining one or more semantic features associated with the second natural language text (Paragraphs 29-30, teach that the system can determine whether one or more phrases are equivalent based on a similarity between the text of the phrases, based on a similarity between an embedding representation of the phrases, based on the similarity between one or more additional representations of the phrases, and/or combinations thereof);
and retrieving the one or more resolver examples using the one or more semantic features (Paragraphs 58-59, teach prior examples retrieval).
With regards to claim 8, Sharifi teaches the method of claim 1, wherein generating the classifier prompt comprises availability of first (ASR) natural language text, past natural language (dialogue) texts and retrieving examples from historical interactions (Paragraphs 51, 69-76, 30-34 and 48-52);
However, Sharifi may not explicitly detail obtaining the classifier instruction template, inserting the first natural language text and the past natural language text into the classifier instruction template and inserting the one or more classifier examples into the classifier instruction template. This is taught by Mielke (Paragraphs 169-176 along with figures 5 and 12, teach instruction templates and insertion of input text with past context alongside few shot examples into classifier prompts for LLMs);
As shown above, Sharifi provides the textual inputs and example repository while Mielke provides how to structure them into a classifier prompt to obtain an LLM classification output. Given Mielke’s demonstrations of few-shot prompt classification (Paragraphs 169-176), combining these teachings to form the claimed classifier prompt is a routine design choice a person of ordinary skill in the art would make.
With regards to claim 9, while Sharifi generally teaches classifiers and end-pointing (Paragraphs 30-34, 51 and 59) it may not explicitly detail the limitation wherein the classifier instruction template comprises an explanation of classifier role and classifier task, one or more refinement keywords, and a classifier output format. This is taught by Mielke (Paragraphs 169-176 along with figures 5 and 12, teach role and task descriptions in instruction templates along with use of control tokens and refinement keywords to influence generation, classification and guidance for specifying an output format to ensure machine parse-able output);
A person of ordinary skill in the art would adopt Mielke’s template features (role/task specification, refinement tokens, output format constraints) when constructing a classifier prompt for Sharifi’s dialog system to produce predictable, structured classifier outputs (Mielke, paragraphs 169-176). Using such template elements is a conventional engineering step to integrate LLM outputs with downstream components.
With regards to claim 10, Sharifi teaches the method of claim 8, wherein obtaining the one or more classifier examples comprises determining one or more contextual factors associated with one or more of a user who made the first user utterance, a user device, and time and retrieving the one or more classifier examples using the one or more contextual factors (Paragraphs 31-34, 46-52 and 59, teach determination of contextual factors such as user/device/time and retrieval of examples based on context).
With regards to claim 11, Sharifi teaches the method of claim 8, wherein obtaining the one or more classifier examples comprises determining one or more semantic features associated with one or more of the first natural language text and the past natural language text; and retrieving the one or more classifier examples using the one or more semantic features (Paragraphs 29-30, teach that the system can determine whether one or more phrases are equivalent based on a similarity between the text of the phrases, based on a similarity between an embedding representation of the phrases, based on the similarity between one or more additional representations of the phrases, and/or combinations thereof. Paragraphs 58-59, teach prior examples retrieval).
With regards to claim 12, Sharifi teaches the method of claim 1, wherein generating the rephraser prompt comprises obtaining current/past texts and historical examples for rephrasing/personalization (Paragraphs 51, 69-76, 30-34 and 48-52);
Sharifi may not explicitly detail obtaining the rephraser instruction template; inserting the first natural language text and the past text input into the rephraser instruction template and inserting the one or more rephraser examples into the rephraser instruction template. This is taught by Mielke (Paragraphs 169-176 along with figures 5 and 12, teach instruction templates and using few shot examples to prompt LLMs for controlled rephrasing and conditional rewriting);
Employing Mielke’s template and example insertion technique to cause an LLM to produce a context aware rephrased utterance from Sharifi’s inputs is a predictable application of Meta’s teaching and a person of ordinary skill in the art would combine them to obtain controlled rephrasing behavior.
With regards to claim 13, while Sharifi teaches intents/slot/entity handling in NLU (Paragraphs 91-104) it may no explicitly detail the limitation wherein the rephraser instruction template comprises an explanation of rephraser role and rephraser task, one or more supported intents, one or more supported entities, and a rephraser output format. This is taught by Mielke (Paragraphs 169-176 along with figures 5 and 12, teach role/task description, control attributes and output format constraints in instruction templates to control LLM outputs along with controlled generation);
A person of ordinary skill in the art would use Mielke’s instruction template features to constrain rephraser outputs to supported intents/entities and to specify an output format to integrate the LLM output into Sharifi’s downstream systems (Mielke, paragraphs 169-176).
With regards to claim 14, Sharifi teaches the method of claim 12, wherein obtaining the one or more rephraser examples comprises determining one or more contextual factors associated with one or more of a user who made the first user utterance, a user device, and time and retrieving the one or more rephraser examples using the one or more contextual factors (Paragraphs 31-34, 46-52 and 59, teach context determination and context-based retrieval of examples).
With regards to claim 15, Sharifi teaches the method of claim 12, wherein obtaining the one or more rephraser examples comprises determining one or more semantic features associated with one or more of the first natural language text and the past text input and retrieving the one or more rephraser examples using the one or more semantic features (Paragraphs 29-30, teach that the system can determine whether one or more phrases are equivalent based on a similarity between the text of the phrases, based on a similarity between an embedding representation of the phrases, based on the similarity between one or more additional representations of the phrases, and/or combinations thereof. Paragraphs 58-59, teach prior examples retrieval).
With regards to claim 16, Sharifi teaches the method of claim 1, wherein generating the rephraser prompt comprises using the past natural language text as the past text input (Paragraphs 31-34, teach maintaining and using past dialog context and past utterances for continuity and personalization, thus describing historical interactions and using prior text. Paragraphs 48-51, describe retrieving and using past interaction examples);
Decision logic to start a new task i.e., operate without prior context (Paragraphs 69-76, teach decision points and task switching where the system treats an utterance as a new task and does not use prior dialog context for subsequent processing, thus describing the decision logic for switching/continuing tasks and using dialog state accordingly);
Use of stored past text in the dialog state manager / task tracker (Paragraphs 31-34 and 59, teach storing and accessing past natural language text and dialog state for use when continuing tasks, especially storing and using historical interactions);
Sharifi may not explicitly detail “in response to the classifier result” as a prompt control signal, i.e., Sharifi does not expressly disclose using a prompt assembled classifier result from an LLM to govern rephraser prompt composition (to include past text or NULL in the rephraser prompt). This is taught by Mielke (Paragraphs 169-176 along with figures 5 and 12, teach assembling prompts by inserting current inputs, prior context, and selected examples into a template, and show use of prior model outputs to condition downstream prompt composition, as outlined by sampling/prompt assembly examples. Pipeline supports using the classifier LLM output as a control signal to either include the past natural language text in the rephraser prompt when classifier indicates continuation or to include NULL/omit the past text when classifier indicates a new/different task);
Sharifi also may not explicitly detail explicit LLM prompt composition behavior tied to the classifier result, i.e., Sharifi does not explicitly teach a rephraser prompt generator that (a) includes the past text when a classifier indicates continuation, or (b) inserts NULL as the past text input when a classifier indicates a new/different task, as a controlled prompt mechanism. This aspect is also taught by Mielke (Paragraphs 169-176 along with figures 5 and 12, teach
making a classifier/correctness prediction via an LLM prompted with an instruction template and examples, and using that prediction to control subsequent generation, as outlined in calibrator/classifier-controlled generation pipeline);
A person of ordinary skill in the art would have been motivated to combine these teachings to operationalize Sharifi’s continuation/new task decision inside an LLM-based rephraser prompt generator when the LLM classifier indicates the new utterance builds on the prior task, include the past natural language text in the rephraser prompt; when the classifier indicates a different/new task, pass NULL — using Mielke’s template/control technique to implement the decision with reasonable expectation of success (Mielke, paragraphs 169-176, figures 5 and 12).
Conclusion
7. The following prior art, made of record but not relied upon, is considered pertinent to applicant's disclosure: Bui (U.S. Patent Application Publication # 2017/0228366 A1), Weisz (U.S. Patent Application Publication # 2025/0252271 A1). These references are also included in the PTO-892 form attached with this office action.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. If you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). In case you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NEERAJ SHARMA whose contact information is given below. The examiner can normally be reached on Monday to Friday 8 am to 5 pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Louis-Desir can be reached on 571-272-7799 (Direct Phone). The fax number for the organization where this application or proceeding is assigned is 571-273-8300.
/NEERAJ SHARMA/
Primary Examiner, Art Unit 2659
571-270-5487 (Direct Phone)
571-270-6487 (Direct Fax)
neeraj.sharma@uspto.gov (Direct Email)