DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant claims the benefit of prior-filed a U.S. National Stage Application filed under 35 U.S.C. §371, International Patent Application No. PCT/JP2020/035622, filed on 18 September 2020,, which is acknowledged.
Drawings
The drawings were received on 06/16/2023. These drawings are acceptable.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on the following dates: 10/29/2025, 09/09/2025, 08/20/2025, 07/23/2025, 05/05/2025, 04/02/2025, 03/19/2025, 02/11/2025, 12/12/2024, 11/05/2024, 09/09/2024, 07/09/2024 and 03/27/2023 have been considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal et al. (US 20200184956, hereinafter ‘Agar’) in view of AbdelHady et al. (US 12175968, hereinafter ‘Hady’).
Regarding independent claim 1, Agar teaches a system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, cause the system to perform a set of operations, the set of operations comprising: (in [0020] As a preliminary matter, the term “hardware logic circuitry” corresponds to one or more hardware processors (e.g., CPUs, GPUs, etc.) that execute machine-readable instructions stored in a memory, and/or one or more other hardware logic components (e.g., FPGAs) that perform operations using a task-specific collection of fixed and/or programmable logic gates. Section C provides additional information regarding one implementation of the hardware logic circuitry. The term “component” or “engine” refers to a part of the hardware logic circuitry that performs a particular function... )
obtaining user input from a user; ([0027] FIG. 1 shows an illustrative computing environment 102 that involves a digital assistant 104 and a user response prediction system (“prediction system”) 106. One or more computing devices implement the digital assistant 104 and the prediction system 106. A digital assistant 104 refers to a software agent configured to respond to questions posed by a user in a multi-turn dialogue in a natural language (e.g., English, French, Japanese, etc.). Any natural language expression made by the prediction system 106 is referred to herein as a “system prompt.” Any natural language expression issued by the user [obtaining user input from a user] is referred to herein as a “response,” “command,” “query,” or “input expression.”)
generating, based on the user input, a skill chain that includes a set of skills with which to process the user input; (in As depicted in Fig. 1 and in FIG. 1 shows an illustrative computing environment 102 that involves a digital assistant 104 and a user response prediction system (“prediction system”) 106. One or more computing devices implement the digital assistant 104 and the prediction system 106 [generating, based on the user input, a skill chain that includes a set of skills with which to process the user input]. A digital assistant 104 refers to a software agent configured to respond to questions posed by a user in a multi-turn dialogue [generating, based on the user input, a skill chain that includes a set of skills with which to process the user input] in a natural language (e.g., English, French, Japanese, etc.)… [0030]One reason for showing the prediction system 106 and digital assistant 104 as separate components in FIG. 1 is to stress that the prediction system 106 works with the digital assistant 104 mainly on the “outside” of the digital assistant 104. That is, the prediction system 106 operates by governing the responses that are fed into the digital assistant 104 as input signals, rather than primarily changing the logic used by the digital assistant 104. For example, the digital assistant 104 can include one or more skill components (108, 110, . . . , 112). Each skill component refers to a logic module that is designed to handle a domain of questions posed by a user [generating, based on the user input, a skill chain that includes a set of skills with which to process the user input]. The prediction system 106 performs its function without requiring changes to the logic used by these skill components (108, 110, . . . , 112). This characteristic facilitates the introduction of the prediction system 106 to an already-designed digital assistant 104, and enables the wide applicability of the prediction system 106 to different types of digital assistants)
for a first model skill of the skill chain: generating, based on a first prompt template associated with the first model skill, a first prompt that includes at least a part of the obtained user input; ([0001] A digital assistant refers to a virtual agent for answering a user's queries, typically via a multi-turn dialogue. In a common case, the user begins by entering an initial command, such as, “Make a hotel reservation in New York.” The digital assistant then identifies the time, place, and other particulars of the reservation through a series of system prompts [based on a first prompt template associated with the first model skill]. To perform this task, the digital assistant typically relies on one or more skills components. Each skill component is configured to handle a particular task [for a first model skill of the skill chain: generating, based on a first prompt template associated with the first model skill, a first prompt that includes at least a part of the obtained user input], such as making a restaurant reservation, controlling a media system, retrieving news, etc..)
and processing, using a first machine learning model associated with the first model skill, the first prompt to obtain intermediate output; for a second model skill of the skill chain: generating, based on a second prompt template associated with the second model skill, a second prompt that includes at least a part of the intermediate output as input for the second model skill; (in As shown in Figs. 2-5: and in [0039] FIGS. 2-5 show four respective examples of the operation of the computing environment 102 of FIG. 1. Each such figure shows a short dialogue between the user and the digital assistant (DA) 104 without the use of the prediction system 106. It then shows how the dialogue would change upon the introduction of the prediction system 106… [0058] A dialogue manager component 606 coordinates with the set of skill components (108, 110, . . . , 112) to provide an answer to the user's input expression [processing, using a first machine learning model associated with the first model skill, the first prompt to obtain intermediate output]. To do this, the dialogue manager component 606 identifies the skill component(s) that should be invoked, and then forwards the interpreted input expression provided by the NLU component 604 to the appropriate skill component(s) [processing, using a first machine learning model associated with the first model skill, the first prompt to obtain intermediate output … a second prompt template associated with the second model skill, claimed template as the extracted appropriate skill components in a plurality of components associated with an input expression]. The dialogue manager component 606 can perform this task by consulting a set of rules which map the domain(s) and intent(s) identified by the NLU component 604 to one or more appropriate skill components that can handle those domain(s) and intent(s). [0059] Each skill component itself can be implemented by any machine-learned model(s) and/or any rules-based engines, etc. In one case, a skill component can use a machine-learned sequence-to-sequence model to map the user's input expression to an output response [generating, based on a second prompt template associated with the second model skill, a second prompt that includes at least a part of the intermediate output as input for the second model skill] . In another case, a skill component can respond to a user's input expression based on one or more pre-stored scripts. Each skill component and/or the dialogue manager component 606 also maintains information regarding the state of a dialogue in progress [the first prompt to obtain intermediate output], e.g., by identifying the questions that have already been asked, and the answers that have already been given, with respect to the task that the user is attempting to complete [generating, based on a second prompt template associated with the second model skill, a second prompt that includes at least a part of the intermediate output as input for the second model skill]].
and processing, using a second machine learning model associated with the second model skill, the second prompt to obtain model output; ([0060] A natural language generation (NLG) component 608 maps each answer given by a skill component into an output expression in a natural language, to provide the final system prompt given to the user. More specifically, a skill component may output its answer in parametric form. For instance, in the context of making a flight reservation, a skill component can provide an answer that specifies a flight number, a flight time, a flight status, and a message type. The message type identifies purpose of the message; here, the purpose of the message is to convey the flight status of a flight. The NLG component 608 converts this answer into a natural language expression, constituting the system prompt. It can do this using a lookup table, one or more machine-learned models, one or more rules-based engines, and so on. An optional voice synthesizer (not shown) can convert a text-based system prompt into a spoken system prompt.)… [0064] In addition, the feature generation component 612 can generate features associated with the contextual circumstances in which the current system prompt was generated. These features can include, but are not limited to: the time at which the user submitted whatever input expression triggered the generation of the system prompt; the location from which the user submitted the input expression; the input expression itself (which can be converted into a feature vector in the same manner described above); an identity of a skill component which generated the system prompt; an identity of a skill component (if any) that was used just prior to the current skill component in the current dialogue [and processing, using a second machine learning model associated with the second model skill, the second prompt to obtain model output], and so on.)
and providing an indication of the model output for display to the user. ([0070] An update component 618 updates the data store 114 each time the user responds to a system prompt. It does this by adding a new record entry to the data store 114 which describes this event. To perform this task, the update component 618 receives input signals from various sources. For instance, the update component 618 receives input signals from the digital assistant 104 which describe a current system prompt [and providing an indication of the model output for display to the user], one or more previous system prompts, a user input expression, a current skill component, one or more previous skill components, etc. The update component 618 can receive other input signals that describe the current time (received from a time-keeping mechanism), current location (received from a position-determining mechanism, such as a GPS component), etc…. [0080] As another refinement, the prediction system 106 can adjust its operation based on setting signals 626 sent by the skill components (108, 110, . . . , 112) [an indication of the model output]. Each setting signal provided by a skill component notifies the prediction system 106 of the extent to which the skill component authorizes the use of predicted responses. For example, a skill component may specify that the prediction system 106 is prohibited from generating a predicted response for any system prompt which derives from an answer given by that skill component. Or a skill component may specify how the prediction system 106 is to perform its operation, e.g., by specifying the data store(s), model(s), features, etc. used by the prediction system 106 [and providing an indication of the model output for display to the user]. The skill component may also specify an extent to which the prediction system 106 uses exploitation and exploration in generating its predicted responses.)
The examiner interprets a skill as any software processing component for processing natural language data in a dialog system for generating actions in response to a user input, as claimed skill chain.
Hady expressly teaches a skill as any software processing component for processing natural language data in a dialog system including the production system component for generating any system prompts,. in 2:43-50: ... As used herein, a “skill” may refer to software, that may be placed on a machine or a virtual machine (e.g., software that may be launched in a virtual instance when called), configured to process NLU output data and perform one or more actions in response thereto [generating, based on the user input, a skill chain that includes a set of skills with which to process the user input]. What is described herein as a skill may be referred to using different terms, such as a processing component, an application, an action, a bot, or the like…
Additionally, Hady teaches the use of templates in processing natural language data, in 10:48-61: The NLG component 275 may generate natural language data based on one or more response templates. For example, the NLG component 275 may select a template in response to the question [a … prompt template associated with the … model skill], “what is the weather currently like” of the form: “the weather currently is $weather_information$.” The NLG component 275 may analyze the logical form of the template to produce one or more natural language responses including markups and annotations to familiarize the response that is generated. In some embodiments, the NLG component 275 may determine which response is the most appropriate response to be selected. The selection may be based on past natural language responses, past natural language input, a level of formality, and/or any other feature, and combinations thereof.
Agar and Hady are analogous art because both involve developing natural language processing machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for developing speech recognition combined with natural language understanding processing techniques to perform tasks based on the spoken input as disclosed by Hady with the method of implementing a digital assistant for answering a user's queries via a multi-turn dialogue information retrieval and processing techniques, as disclosed by Agar.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Hady and Agar as noted above. Doing so allowing for developing and implementing software processing components as user enabled skills that are to executed with respect to the user's natural language inputs, (Hardy, Abstract and 11:39-62).
Regarding claim 2, the rejection of claim 1 is incorporated and Agar in combination with Hady teaches the system of claim 1, wherein generating the skill chain comprises: generating a skill listing corresponding to a set of skills of a skill library, wherein the skill listing includes a description for each skill of the set of skills; providing, to a machine learning service, an indication of the user input and the skill listing; and receiving, from the machine learning service, the skill chain corresponding to the user input. (in [0030] One reason for showing the prediction system 106 and digital assistant 104 as separate components in FIG. 1 is to stress that the prediction system 106 works with the digital assistant 104 mainly on the “outside” of the digital assistant 104. That is, the prediction system 106 operates by governing the responses that are fed into the digital assistant 104 as input signals, rather than primarily changing the logic used by the digital assistant 104. For example, the digital assistant 104 can include one or more skill components [wherein generating the skill chain comprises: generating a skill listing corresponding to a set of skills of a skill library,] (108, 110, . . . , 112). Each skill component refers to a logic module that is designed to handle a domain of questions posed by a user [wherein the skill listing includes a description for each skill of the set of skills]. The prediction system 106 performs its function without requiring changes to the logic used by these skill components (108, 110, . . . , 112). .. [0058] A dialogue manager component 606 coordinates with the set of skill components (108, 110, . . . , 112) [wherein generating the skill chain comprises: generating a skill listing corresponding to a set of skills of a skill library, wherein the skill listing includes a description for each skill of the set of skills]to provide an answer to the user's input expression. To do this, the dialogue manager component 606 identifies the skill component(s) that should be invoked [providing, to a machine learning service, an indication of the user input and the skill listing], and then forwards the interpreted input expression provided by the NLU component 604 to the appropriate skill component(s) [receiving, from the machine learning service, the skill chain corresponding to the user input]. The dialogue manager component 606 can perform this task by consulting a set of rules which map the domain(s) and intent(s) identified by the NLU component 604 to one or more appropriate skill components that can handle those domain(s) and intent(s)[ providing, to a machine learning service, an indication of the user input and the skill listing; and receiving, from the machine learning service, the skill chain corresponding to the user input]. [0059] Each skill component itself can be implemented by any machine-learned model(s) and/or any rules-based engines, etc. In one case, a skill component can use a machine-learned sequence-to-sequence model to map the user's input expression to an output response. In another case, a skill component can respond to a user's input expression based on one or more pre-stored scripts. Each skill component and/or the dialogue manager component 606 also maintains information regarding the state of a dialogue in progress, e.g., by identifying the questions that have already been asked, and the answers that have already been given, with respect to the task that the user is attempting to complete)
Regarding claim 3, the rejection of claim 1 is incorporated and Agar in combination with Hady teaches the system of claim 1, wherein generating the skill chain comprises: generating, for the user input, an input embedding that encodes an intent of the user input; determining, from a skill library, a set of skills that each have an associated semantic embedding that matches the generated input embedding; and generating the skill chain based on the determined set of skills. ([0058] A dialogue manager component 606 coordinates with the set of skill components (108, 110, . . . , 112) to provide an answer to the user's input expression. To do this, the dialogue manager component 606 identifies the skill component(s) that should be invoked, and then forwards the interpreted input expression provided by the NLU component 604 to the appropriate skill component(s). The dialogue manager component 606 can perform this task by consulting a set of rules which map the domain(s) [determining, from a skill library, a set of skills that each have an associated semantic embedding that matches the generated input embedding; and generating the skill chain based on the determined set of skills] and intent(s) identified by the NLU component 604 to one or more appropriate skill components that can handle those domain(s) and intent(s)[ wherein generating the skill chain comprises: generating, for the user input, an input embedding that encodes an intent of the user input]… [0119] According to a twentieth aspect, a computer-readable storage medium is described for storing computer-readable instructions. The computer-readable instructions, when executed by one or more hardware processors, perform a method that includes receiving a system prompt generated by a digital assistant, the digital assistant generating the system prompt in response to an input command provided by a user via an input device, both the system prompt and the input command being expressed in a natural language. The digital assistant includes: a natural language understanding (NLU) component for interpreting the user input command, to provide an interpreted user command; a dialogue manager for coordinating with one or more skill components [determining, from a skill library, a set of skills that each have an associated semantic embedding that matches the generated input embedding; and generating the skill chain based on the determined set of skills] to provide an answer to the interpreted user command; and a natural language generator (NGU) component for generating the system prompt, in the natural language, in reply to the answer…)
Regarding claim 4, the rejection of claim 3 is incorporated and Agar in combination with Hady teaches the system of claim 3, wherein it is determined that a semantic embedding matches the input embedding based on an algorithmic similarity metric between the semantic embedding and the input embedding. (in [0075] A user response-filtering component (URFC) 624 either passes the user's original user response through without modification, or modifies a user's response such that it is consistent with the input expectations of the digital assistant 104. For example, assume that a confirmation prompt asks the user to confirm that he or she wishes to attend the Lincoln Square Cinemas in Bellevue, Wash. The user will respond by saying either “yes” or “no.” The URFC 624 will substitute the actual user response with the response that the digital assistant 104 is expecting, namely “Lincoln Square Cinemas.” Assume, instead, that the user says “no” in response to the confirmation prompt. The URFC 624 and SRFC 622 can respond to this event using different environment-specific strategies. In one approach, the URFC 624 instructs the SRFC 622 to issue the original system prompt, rather than the confirmation prompt. The original prompt reads, “At which theater should I book the tickets?” In another approach, the SRFC 622 can offer another confirmation prompt to the user based on another predicted response provided by the predictor component 610 (e.g., which may correspond to the record entry having the second-best matching score, the user having already rejected the record entry having the best matching score) [wherein it is determined that a semantic embedding matches the input embedding based on an algorithmic similarity metric between the semantic embedding and the input embedding.]…)
Additionally Hady teaches wherein it is determined that a semantic embedding matches the input embedding based on an algorithmic similarity metric between the semantic embedding and the input embedding. (20:63-21:20: The skill candidate component 420 may be configured to perform lexical similarity processing and/or semantic similarity processing [wherein it is determined that a semantic embedding matches the input embedding based on an algorithmic similarity metric between the semantic embedding and the input embedding] with respect to each skill metadata received by the skill candidates components 420. Lexical similarity processing may refer to processing performed to measure a degree to which words of the natural language input and sample natural language inputs of skills are similar. For example, a lexical similarity of 1 (or 100%) may mean a total overlap between vocabularies, whereas 0 may mean there are no common words. Semantic similarity processing may refer to processing to determine a metric defined over a set of documents or terms, where the idea of distance between items is based on the likeness of their meaning or semantic content as opposed to lexicographical similarity. In the example of FIG. 6, where the skill candidates component 420 receives first language skill metadata 545a (corresponding to a first language, or locale and first language) and second language skill metadata 545b (corresponding to a second language, or locale and second language), the skill candidates component 420 may perform lexical similarity processing 610a with respect to the first language skill metadata 545a to determine one of more skill candidates 425a and/or semantic similarity processing 620a with respect to the first language skill metadata 545a to determine one or more skill candidates 425b…)
Regarding claim 5, the rejection of claim 1 is incorporated and Agar in combination with Hady teaches the system of claim 1, wherein processing the first prompt to obtain the intermediate output comprises: providing, to a machine learning service, a request to process the first prompt using the first machine learning model; and receiving, from the machine learning service, a response that includes the intermediate output. (in [0058] A dialogue manager component 606 coordinates with the set of skill components (108, 110, . . . , 112) to provide an answer to the user's input expression. To do this, the dialogue manager component 606 identifies the skill component(s) that should be invoked [wherein processing the first prompt to obtain the intermediate output comprises: providing, to a machine learning service, a request to process the first prompt using the first machine learning model], and then forwards the interpreted input expression provided by the NLU component 604 to the appropriate skill component(s). The dialogue manager component 606 can perform this task by consulting a set of rules which map the domain(s) and intent(s) identified by the NLU component 604 to one or more appropriate skill components that can handle those domain(s) and intent(s) [and receiving, from the machine learning service, a response that includes the intermediate output]. [0059] Each skill component itself can be implemented by any machine-learned model(s) and/or any rules-based engines, etc. In one case, a skill component can use a machine-learned sequence-to-sequence model to map the user's input expression to an output response. In another case, a skill component can respond to a user's input expression based on one or more pre-stored scripts. Each skill component and/or the dialogue manager component 606 also maintains information regarding the state of a dialogue in progress [and receiving, from the machine learning service, a response that includes the intermediate output], e.g., by identifying the questions that have already been asked, and the answers that have already been given, with respect to the task that the user is attempting to complete. [0060] A natural language generation (NLG) component 608 maps each answer given by a skill component into an output expression in a natural language, to provide the final system prompt given to the user. More specifically, a skill component may output its answer in parametric form. For instance, in the context of making a flight reservation, a skill component can provide an answer that specifies a flight number, a flight time, a flight status, and a message type. The message type identifies purpose of the message; here, the purpose of the message is to convey the flight status of a flight. The NLG component 608 converts this answer into a natural language expression, constituting the system prompt. It can do this using a lookup table, one or more machine-learned models, one or more rules-based engines, and so on… )
Regarding claim 6, the rejection of claim 1 is incorporated and Agar in combination with Hady teaches the system of claim 1, wherein the intermediate output of the first model skill includes structured output. (in[0060] A natural language generation (NLG) component 608 maps each answer given by a skill component into an output expression in a natural language, to provide the final system prompt given to the user. More specifically, a skill component may output its answer in parametric form [wherein the intermediate output of the first model skill includes structured output]. For instance, in the context of making a flight reservation, a skill component can provide an answer that specifies a flight number, a flight time, a flight status, and a message type. The message type identifies purpose of the message; here, the purpose of the message is to convey the flight status of a flight. The NLG component 608 converts this answer into a natural language expression, constituting the system prompt. It can do this using a lookup table, one or more machine-learned models, one or more rules-based engines, and so on. An optional voice synthesizer (not shown) can convert a text-based system prompt [wherein the intermediate output of the first model skill includes structured output] into a spoken system prompt… [0064] In addition, the feature generation component 612 can generate features associated with the contextual circumstances in which the current system prompt was generated. These features can include, but are not limited to: the time at which the user submitted whatever input expression triggered the generation of the system prompt; the location from which the user submitted the input expression; the input expression itself (which can be converted into a feature vector in the same manner described above); an identity of a skill component which generated the system prompt; an identity of a skill component (if any) that was used just prior to the current skill component in the current dialogue, and so on [wherein the intermediate output of the first model skill includes structured output]... )
Regarding claim 7, the rejection of claim 6 is incorporated and Agar in combination with Hady teaches the system of claim 6, wherein at least a part the first prompt corresponds to the structured output. (in[0060] A natural language generation (NLG) component 608 maps each answer given by a skill component into an output expression in a natural language, to provide the final system prompt given to the user. More specifically, a skill component may output its answer in parametric form [wherein at least a part the first prompt corresponds to the structured output]. For instance, in the context of making a flight reservation, a skill component can provide an answer that specifies a flight number, a flight time, a flight status, and a message type. The message type identifies purpose of the message; here, the purpose of the message is to convey the flight status of a flight. The NLG component 608 converts this answer into a natural language expression, constituting the system prompt. It can do this using a lookup table, one or more machine-learned models, one or more rules-based engines, and so on. An optional voice synthesizer (not shown) can convert a text-based system prompt [wherein at least a part the first prompt corresponds to the structured output] into a spoken system prompt… [0064] In addition, the feature generation component 612 can generate features associated with the contextual circumstances in which the current system prompt was generated. These features can include, but are not limited to: the time at which the user submitted whatever input expression triggered the generation of the system prompt; the location from which the user submitted the input expression; the input expression itself (which can be converted into a feature vector in the same manner described above); an identity of a skill component which generated the system prompt; an identity of a skill component (if any) that was used just prior to the current skill component in the current dialogue, and so on [wherein at least a part the first prompt corresponds to the structured output]... )
Regarding independent claim 8, Agar in combination with Hady teaches method, comprising: … processing, by the computing device, at least a part of the model output to affect operation of the computing device. (in [0001] A digital assistant refers to a virtual agent for answering a user's queries, typically via a multi-turn dialogue [processing, by the computing device, at least a part of the model output to affect operation of the computing device]. In a common case, the user begins by entering an initial command, such as, “Make a hotel reservation in New York.” The digital assistant then identifies the time, place, and other particulars of the reservation through a series of system prompts. To perform this task, the digital assistant typically relies on one or more skills components. Each skill component is configured to handle a particular task, such as making a restaurant reservation, controlling a media system, retrieving news, etc [processing, by the computing device, at least a part of the model output to affect operation of the computing device]. And in [0080] As another refinement, the prediction system 106 can adjust its operation based on setting signals 626 sent by the skill components (108, 110, . . . , 112) [processing, by the computing device, at least a part of the model output to affect operation of the computing device]. Each setting signal provided by a skill component notifies the prediction system 106 of the extent to which the skill component authorizes the use of predicted responses. For example, a skill component may specify that the prediction system 106 is prohibited from generating a predicted response for any system prompt which derives from an answer given by that skill component. Or a skill component may specify how the prediction system 106 is to perform its operation, e.g., by specifying the data store(s), model(s), features, etc. used by the prediction system 106. The skill component may also specify an extent to which the prediction system 106 uses exploitation and exploration in generating its predicted responses [processing, by the computing device, at least a part of the model output to affect operation of the computing device].))
The remaining limitations are similar with claim 1 limitations and are rejected under the same rationale.
Regarding claim 9, the rejection of claim 8 is incorporated and Agar in combination with Hady teaches the method of claim 8, wherein the first machine learning model is the second machine learning model. (in [0080] As another refinement, the prediction system 106 can adjust its operation based on setting signals 626 sent by the skill components (108, 110, . . . , 112). Each setting signal provided by a skill component notifies the prediction system 106 of the extent to which the skill component authorizes the use of predicted responses [wherein the first machine learning model is the second machine learning model]. For example, a skill component may specify that the prediction system 106 is prohibited from generating a predicted response for any system prompt which derives from an answer given by that skill component [wherein the first machine learning model is the second machine learning model]. Or a skill component may specify how the prediction system 106 is to perform its operation, e.g., by specifying the data store(s), model(s), features, etc. used by the prediction system 106. The skill component may also specify an extent to which the prediction system 106 uses exploitation and exploration in generating its predicted responses. Alternately, in [0059] Each skill component itself can be implemented by any machine-learned model(s) and/or any rules-based engines, etc. In one case, a skill component can use a machine-learned sequence-to-sequence model [wherein the first machine learning model is the second machine learning model] to map the user's input expression to an output response. In another case, a skill component can respond to a user's input expression based on one or more pre-stored scripts. Each skill component and/or the dialogue manager component 606 also maintains information regarding the state of a dialogue in progress, e.g., by identifying the questions that have already been asked, and the answers that have already been given, with respect to the task that the user is attempting to complete... [0068] As noted above, in still another case, the forecaster component 614 can use a machine-learned generative model of any type to map the current system prompt and its contextual factors into the predicted response, without directly using a previously-encountered user response verbatim. For example, the forecaster component 614 can use a sequence-to-sequence model [wherein the first machine learning model is the second machine learning model] to generate a predicted response based on various items of input information, including, but not limited to: the current system prompt, contextual features, record entries in the data store 114, etc. Such a model can be implemented, for instance, by a Recurrent Neural Network (RNN) composed of LSTM units.)
Regarding claim 10, the rejection of claim 8 is incorporated and Agar in combination with Hady teaches the method of claim 8, wherein: the skill chain further comprises a programmatic skill that is performed by the computing device; and output of the programmatic skill is processed as input for the second model skill. (in [0059] Each skill component itself can be implemented by any machine-learned model(s) and/or any rules-based engines, etc. In one case, a skill component can use a machine-learned sequence-to-sequence model to map the user's input expression to an output response. In another case, a skill component can respond to a user's input expression based on one or more pre-stored scripts [wherein: the skill chain further comprises a programmatic skill that is performed by the computing device; and output of the programmatic skill is processed as input for the second model skill]. Each skill component and/or the dialogue manager component 606 also maintains information regarding the state of a dialogue in progress [and output of the programmatic skill is processed as input for the second model skill], e.g., by identifying the questions that have already been asked, and the answers that have already been given, with respect to the task that the user is attempting to complete... [0063] The predictor component 610 includes a feature generation component 612 that generates a set of features [wherein: the skill chain further comprises a programmatic skill that is performed by the computing device], including, in part, features that describe the current system prompt. More specifically, the feature generation component 612 can convert the current system prompt into one or more feature vectors using any kind of encoder.. As a further process, the feature generation component 612 can optionally use any machine-learned model (such as a neural network) to convert a one-hot or n-gram feature vector into a higher-level form. [0064] In addition, the feature generation component 612 can generate features associated with the contextual circumstances in which the current system prompt was generated. These features can include, but are not limited to: … an identity of a skill component which generated the system prompt; an identity of a skill component (if any) that was used just prior to the current skill component in the current dialogue, and so on [and output of the programmatic skill is processed as input for the second model skill]. [0065] If an information retrieval paradigm is being used, the feature generation component 612 can produce similar features to those described above for the record entry in the data store 114 to which the current system prompt is being compared. More specifically, each record entry includes a previously-generated system prompt along with its contextual features... In addition, the feature generation component 612 can generate one or more features that describe the relationship of the current system prompt and the previously-generated system prompt associated with the record entry under consideration, such as an edit distance feature, etc… [0068] As noted above, in still another case, the forecaster component 614 can use a machine-learned generative model of any type to map the current system prompt and its contextual factors into the predicted response [and output of the programmatic skill is processed as input for the second model skill], without directly using a previously-encountered user response verbatim. For example, the forecaster component 614 can use a sequence-to-sequence model to generate a predicted response based on various items of input information, including, but not limited to: the current system prompt, contextual features, record entries in the data store 114, etc. Such a model can be implemented, for instance, by a Recurrent Neural Network (RNN) composed of LSTM units.)
Regarding claim 11, the rejection of claim 10 is incorporated and Agar in combination with Hady teaches the method claim 10, wherein the intermediate output of the first model skill includes structured output that is processed by the programmatic skill. (in [0059] Each skill component itself can be implemented by any machine-learned model(s) and/or any rules-based engines, etc. In one case, a skill component can use a machine-learned sequence-to-sequence model to map the user's input expression to an output response. In another case, a skill component can respond to a user's input expression based on one or more pre-stored scripts. Each skill component and/or the dialogue manager component 606 also maintains information regarding the state of a dialogue in progress [wherein the intermediate output of the first model skill includes structured output that is processed by the programmatic skill], e.g., by identifying the questions that have already been asked, and the answers that have already been given, with respect to the task that the user is attempting to complete... [0063] The predictor component 610 includes a feature generation component 612 that generates a set of features, including, in part, features that describe the current system prompt. More specifically, the feature generation component 612 can convert the current system prompt into one or more feature vectors using any kind of encoder.. As a further process, the feature generation component 612 can optionally use any machine-learned model (such as a neural network) to convert a one-hot or n-gram feature vector into a higher-level form. [0064] In addition, the feature generation component 612 can generate features associated with the contextual circumstances in which the current system prompt was generated. These features can include, but are not limited to: … an identity of a skill component which generated the system prompt; an identity of a skill component (if any) that was used just prior to the current skill component in the current dialogue [a wherein the intermediate output of the first model skill includes structured output that is processed by the programmatic skill], and so on.
Regarding claim 12, the rejection of claim 8 is incorporated and Agar in combination with Hady teaches the method of claim 8, wherein processing the part of the model output comprises displaying the part of the model output to a user of the computing device. (in [0001] A digital assistant refers to a virtual agent for answering a user's queries, typically via a multi-turn dialogue [wherein processing the part of the model output comprises displaying the part of the model output to a user of the computing device]. In a common case, the user begins by entering an initial command, such as, “Make a hotel reservation in New York.” The digital assistant then identifies the time, place, and other particulars of the reservation through a series of system prompts [wherein processing the part of the model output comprises displaying the part of the model output to a user of the computing device]. To perform this task, the digital assistant typically relies on one or more skills components. Each skill component is configured to handle a particular task, such as making a restaurant reservation, controlling a media system, retrieving news, etc . And in [0080] As another refinement, the prediction system 106 can adjust its operation based on setting signals 626 sent by the skill components (108, 110, . . . , 112). Each setting signal provided by a skill component notifies the prediction system 106 of the extent to which the skill component authorizes the use of predicted responses [wherein processing the part of the model output comprises displaying the part of the model output to a user of the computing device]. For example, a skill component may specify that the prediction system 106 is prohibited from generating a predicted response for any system prompt which derives from an answer given by that skill component. Or a skill component may specify how the prediction system 106 is to perform its operation, e.g., by specifying the data store(s), model(s), features, etc. used by the prediction system 106. The skill component may also specify an extent to which the prediction system 106 uses exploitation and exploration in generating its predicted responses [wherein processing the part of the model output comprises displaying the part of the model output to a user of the computing device].)
Regarding claim 13, the rejection of claim 8 is incorporated and Agar in combination with Hady teaches the method of claim 8, wherein processing the part of the model output comprises parsing, by an application of the computing device, the part of the model output to affect operation of the application. (in [0063] The predictor component 610 includes a feature generation component 612 that generates a set of features, including, in part, features that describe the current system prompt [wherein processing the part of the model output comprises parsing, by an application of the computing device, the part of the model output to affect operation of the application]. More specifically, the feature generation component 612 can convert the current system prompt into one or more feature vectors using any kind of encoder [wherein processing the part of the model output comprises parsing, by an application of the computing device, the part of the model output to affect operation of the application]. For example, the feature generation component 612 can convert each word of the system prompt into a one-hot feature vector (which includes a “1” entry in the dimension of the vector associated with the word, and a “0” entry in other dimensions). Or the feature generation component 612 can use an n-gram technique [wherein processing the part of the model output comprises parsing, by an application of the computing device, the part of the model output to affect operation of the application] to convert each word into a feature vector. For example, the feature generation component 612 can move a three-character window across a word, character by character. At each location, the feature generation component 612 can store a “1” entry in a dimension of the feature vector associated with the 3-character sequence demarcated by the window. For example, the feature generation component 612 can convert the word “hotel” into a vector having “1” entries in the appropriate dimensions for the sequences “# ho,” “hot,” “ote,” “tel,” and “el #,” where the “#” symbol refers to a dummy token marking the beginning or ending of a sequence. If a word contains two or more instances of the same three-character sequence, the feature generation component 612 can store a count of the number of instances in the appropriate dimension of the feature vector. As a further process, the feature generation component 612 can optionally use any machine-learned model (such as a neural network) to convert a one-hot or n-gram feature vector into a higher-level form…)
Regarding independent claim 14, the limitations are similar with claim 1 limitations and are rejected under the same rationale.
Regarding claims 15-20, the limitations are similar to those in claims 2-7 and are rejected under the same rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Wu et al. (NPL: “PromptChainer: Chaining Large Language Model Prompts through Visual Programming”): teaches chain-of-thought reasoning has been shown to dramatically improve the ability of LLMs to complete complex reasoning tasks, such as solving math problems that require multiple steps. Early works on chain-of-thought used fine-tuning or in-context learning to get LLMs to show their work for such problems. One of the most influential recent works of prompt engineering was the discovery that LLMs could be made to give chain-of-thoughts simply by prepending “Let’s think step by step.” to the beginning of the LLM’s response.
Chen et al. (NPL: “Skill-it! a data-driven skills framework for understanding and training language models”): teaches if the idea of skill orderings can help us build a framework that relates data to LM training and behavior. This requires addressing two challenges revolving around the connection between skills and data. First, in order to show that there exist sets of skills that the LM learns most efficiently in some particular order, an operational definition of LM skill and skill ordering must be developed and validated on data. In initial experiments, we investigated if semantic groupings of data, such as metadata attributes or embedding clusters, were sufficient to represent a skill and characterize how models learn. For instance, we partitioned the Alpaca dataset by instruction type—a technique used to capture dataset diversity but we found that sampling based on instruction types and random sampling resulted in similar model performance, suggesting that not just any existing notion of data groups can characterize skills
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OLUWATOSIN ALABI/Primary Examiner, Art Unit 2129