DETAILED ACTION
This communication is in response to the Arguments and Amendments filed on 01/12/2026.
Claim(s) 1-20 are pending and have been examined.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/30/2025, 12/15/2025, and 01/06/2026 were filed. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Arguments and Amendments
Amendments to the claims by the Applicant have been considered and addressed below.
With respect to the 35 USC § 101 and 103 rejections, the Applicant provides several arguments in which the Examiner will respond accordingly, below.
35 USC § 101 rejection(s)
Arguments in pages 7-11 of the Remarks filed on 06/13/2025:
Examiner’s Response to Arguments:
Arguments have been considered but these are not persuasive. The Examiner respectfully disagrees with the arguments below:
“However, claim 1 as currently amended recites features that are performed at the generative machine learning model in a manner that is distinct from processes performed in the human mind or on paper. The steps of computing the structured prompt and prompting the generative machine learning model with the structured prompt, as recited in claim 1 as currently amended, do not correspond to mental processes, since human writing of dialogue turns does not utilize a structured prompt that is input into a generative machine learning model.”,
“The current Office action analogizes the multimodal machine learning model to game rules. However, game rules do not satisfy either the plain meaning of the term "generative machine learning model" recited in claim 1, or the description of the generative machine learning model provided in Para. [0019] of the subject application.”,
“Instead, "at least in part by prompting a generative machine learning model with the structured prompt, generating a set of candidate interactions for the NPC that includes model output associated with the generative machine learning model" is a technical feature that relies on a structure distinct from the human mind.”, and
“The limitation "storing the node in a game agent data store for the game environment" of claim 1 as currently amended further clarifies this integration by explicitly reciting that the node that includes the natural language output is stored for later use (e.g., by the developer and/or a player). Thus, Applicant respectfully submits that claim 1 is directed to eligible subject matter at Step 2A Prong 2 of the Alice Mayo subject matter eligibility test.”
The Examiner notes that even if the argued claim limitations cannot practically be performed by a human mind, these can be performed by pen and paper as discussed in the previous Office Actions. Additionally, the Examiner notes that the language associated with the generative machine learning model as drafted is too broad and would still read on a human using available tools like pen and paper to follow a predefined and pre-known set of steps or rules and does not consider it enough to be considered a “technical feature that relies on a structure distinct from the human mind” as argued. Also, the Examiner notes that the storing of the node in a game agent store is not considered an additional element that integrates the judicial exception into a practival application (Step 2A, Prog two).
Lastly, as noted in the previous Office Action, mailed on 09/11/2025, the Examiner would suggest more details/language associated with the cloud service and its interaction with some of the other elements of the claims to overcome the rejection.
Please refer to the updated 35 U.S.C. § 101 rejection(s), below.
35 USC § 103 rejection(s)
Arguments in pages 12-14 of the Remarks filed on 01/12/2026.
Examiner’s Response to Arguments:
Applicant' s arguments with respect to claim(s) 1, 8, and 14 under 35 U.S.C. § 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made:
in view of Kluwer et al. ("Talking NPCs in a virtual game world." Proceedings of the ACL 2010 System Demonstrations. 2010. https://aclanthology.org/P10-4007.pdf) and further in view of Chang et al. (WO 2021159779 A1) and Olabiyi et al. (US 20200098353 A1) for independent claims 1 and 14; and
in view of Kluwer et al. ("Talking NPCs in a virtual game world." Proceedings of the ACL 2010 System Demonstrations. 2010. https://aclanthology.org/P10-4007.pdf) and further in view of Chang et al. (WO 2021159779 A1) and Olabiyi et al. (US 20200098353 A1) and Dicken et al. (US 20220088474 A1) for independent claim 8.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. More specifically directed to the abstract idea grouping of: mental process.
The independent claim(s) 1 and 14 recite(s):
computing a structured prompt that includes agent information for a non-player character (NPC) of a game environment;
at least in part by prompting a generative machine learning model with the structured prompt, generatingNPC that includes model output associated with the generative machine learning model, wherein a dialogue turn of theNPC is generated using the generative machine learning model;
presenting the set of candidate interactions for user selection;
receiving a selection of one or more candidate interactions; and
generating, for each selected candidate interaction, a node in a dialogue tree comprising natural language output for the selected candidate interaction representing the dialogue turn for thNPC; and
storing the node in a game agent data store for the game environment.
This reads on a human (e.g., by pen and paper):
Determining information including information from another human (e.g., questions or queries on a piece of paper or from another human);
Writing down the information and determining possible responses to the questions/queries associated with a predefined set of rules (i.e., model), wherein turns are determined based on the predefined set of rules (i.e., model);
Showing said written responses to another human for selection;
Receiving a selection from the other human;
Drawing a table or diagram associated with the selection from the other human representing the turn of the other human.
The independent claim(s) 8 recite(s):
computing a first structured prompt that includes a first instance of agent information for a first non-player character (NPC) of a game environment;
generating, based on [[a]]the first instance of agent information, a first set of candidate interactions fothe first NPC, wherein a dialogue turn of the first NPC is generated using a generative machine learning model at least in part by prompting the generative machine learning model with the first structured prompt;
presenting the first set of candidate interactions for user selection;
receiving a first selection of one or more of the first set of candidate interactions representing the dialogue turn for the first NPC
generating, for each first selected candidate interaction, a node in a dialogue tree for the firstNPC;
computing a second structured prompt that includes a second instance of agent information for a second NPC of the game environment;
generating, based on [[a]]the second instance of agent information, a second set of candidate interactions fothe second NPC, wherein a dialogue turn of the second NPC is generated using thegenerative machine learning model; presenting the second set of candidate interactions for user selection; receiving a second selection of one or more of the second set of candidate interactions representing the dialogue turn for the secondNPC;
generating, for each second selected candidate interaction, a node in the dialogue tree for the secondNPC, wherein the generated node is a child node of a node for the firstNPC; and storing the dialogue tree in a game agent data store for the game environment.
This reads on a human (e.g., by pen and paper):
Determining information including information from another human (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model), wherein turns are determined based on the predefined set of rules (i.e., model);
Writing down the information and determining possible responses and showing said written responses to another human for selection representing the turn of the other human;
Receiving a first selection from the other human;
Drawing a table or diagram associated with the selection from the other human
Obtaining more information and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model), wherein turns are determined based on the predefined set of rules (i.e., model);
Showing said written responses to another human for selection;
Receiving a second selection from the other human representing the turn of the other human;
Updating the drawing of a table or diagram associated with the second selection from the other human
Saving said drawing of table or diagram.
This judicial exception is not integrated into a practical application because for example: claim 1 recites “processor”, “memory”, “non-player character (NPC)”, and “game environment”, and “game agent data store”. As an example, in ¶ [0089] of the as filed specification, “… embodiments of the disclosure may be practiced within a general purpose computer or in any other circuits or systems.” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea.
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible.
With respect to claim 2, the claim(s) recite:
wherein:
the NPC is a first NPC
the structured prompt is a first structured prompt;
the agent information is a first instance of agent information;
the set of candidate interactions is a first set of candidate interactions;
the generated nodes [[is]]are included in a set of generated nodes for the first NPC
computing a second structured prompt including a second instance of agent information for a second NPC of the game environment;
at least in part by prompting the generative machine learning model with the second structured prompt, generating,NPC that includes model output associated with the generative machine learning model;
presenting the second set of candidate interactions for user selection;
receiving a second selection of one or more candidate interactions; and
generating, for each selected candidate interaction for the secondNPC, a node in the dialogue tree comprising natural language output for the selected candidate interaction, wherein the generated node is associated with a node of the generated set of nodes for the firsNPC; and
storing the node in the game agent data store.
This reads on a human:
Determining information including information from another human(e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model);
Showing said written responses to another human for selection;
Receiving a first selection from the other human;
Drawing a table or diagram associated with the selection from the other human
Obtaining more information and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model);
Showing said written responses to another human for selection;
Receiving a second selection from the other human;
Updating the drawing of a table or diagram associated with the second selection from the other human
Additional limitations of “computer-controlled agent(s)” are present. Same analysis as to independent claims 1, 8, and 14 discussed above, applies.
With respect to claim 3, the claim(s) recite:
wherein the second instance of agent information includes an indication of a selected candidate interaction for the first NPC.
This reads on a human:
Wherein the second selection from the other human includes a selection from the responses shown for selection, above.
Additional limitations of “computer-controlled agent(s)” are present. Same analysis as to independent claims 1, 8, and 14 discussed above, applies.
With respect to claim 4, the claim(s) recite:
wherein the agent information comprises at least one of:
background information associated with the game environment;
historical information associated with the user; or
game environment state information for the game environment.
This reads on a human:
Wherein the obtained information comprises, for example, details of the background or historical information from the other human.
No additional limitations are present.
With respect to claim 5, the claim(s) recite:
wherein each generated node further comprises at least one of programmatic output of the model output or an associated emotion for the natural language output.
This reads on a human:
Wherein drawing the table or diagram associated with the selection from the other human is associated with the response according to a predetermined set of rules (i.e., model) or an emotion from the other human.
No additional limitations are present.
With respect to claim 6, the claim(s) recite:
wherein the set of candidate interactions includes a first candidate interaction having a first mood and a second candidate interaction having a second mood that is different from the first mood.
This reads on a human:
Wherein the obtaining information (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model) consists of having responses associated with two different moods or emotions.
No additional limitations are present.
With respect to claim 7, the claim(s) recite:
wherein generating the set of candidate interactions comprises:
providing, to a machine learning service, an indication of the agent information; and
receiving, from the machine learning service, the model output.
This reads on a human:
Wherein the obtaining information (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model) consists of
selecting (using a predefined set of rules, i.e., model) information and
receiving a response according to predefined set of rules, i.e., model.
No additional limitations are present.
With respect to claim 9, the claim(s) recite:
wherein the second instance of agent information is based on the first instance of agent information and includes an indication of a first selected candidate interaction for the first NPC.
This reads on a human:
Wherein the second selection from the other human includes a selection from the responses shown for selection, above.
Additional limitations of “computer-controlled agent(s)” are present. Same analysis as to independent claims 1, 8, and 14 discussed above, applies.
With respect to claim 10, the claim(s) recite:
wherein the agent information comprises at least one of:
background information associated with the game environment;
historical information associated with the user; or
game environment state information for the game environment.
This reads on a human:
Wherein the obtained information comprises, for example, details of the background or historical information from the other human.
No additional limitations are present.
With respect to claim 11, the claim(s) recite:
wherein generating the first set of candidate interactions comprises:
providing, to a machine learning service, an indication of the first instance of agent information; and
receiving, from the machine learning service, model output comprising the first set of candidate interactions.
This reads on a human:
Wherein the obtaining information (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model) consists of
selecting (using a predefined set of rules, i.e., model) information and
receiving a response according to predefined set of rules, i.e., model.
No additional limitations are present.
With respect to claim 12, the claim(s) recite:
wherein each generated node for the first NPC comprises one or more of:
natural language output of the model output;
programmatic output of the model output;
an emotion associated with the natural language output; or
an indication of an animation for the first computer-controlled agent associated with the natural language output.
This reads on a human:
Wherein drawing the table or diagram associated with the selection from the other human is associated with the response according to a predetermined set of rules (i.e., model) or an emotion from the other human.
Additional limitations of “computer-controlled agent(s)” are present. Same analysis as to independent claims 1, 8, and 14 discussed above, applies.
With respect to claim 13, the claim(s) recite:
wherein the first instance of agent information comprises at least one of:
background information associated with the game environment;
historical information associated with the user; or
game environment state information for the game environment.
This reads on a human:
Wherein the obtained information comprises, for example, details of the background or historical information from the other human.
No additional limitations are present.
With respect to claim 15, the claim(s) recite:
wherein presenting the set of candidate interactions comprises displaying natural language output of a generative machine learning model for each candidate interaction of the generated set of candidate interactions.
This reads on a human:
Writing down and showing possible responses to the questions/queries associated with a predefined set of rules (i.e., model).
No additional limitations are present.
With respect to claim 16, the claim(s) recite:
wherein obtaining the agent information comprises receiving user input that indicates a context for the NPC; and
computing the structured prompt based at least in part on the user input.
This reads on a human:
Wherein the information comprises context received from another human.
Additional limitations of “computer-controlled agent(s)” are present. Same analysis as to independent claims 1, 8, and 14 discussed above, applies.
With respect to claim 17, the claim(s) recite:
wherein generating the set of candidate interactions comprises altering the agent information for a specific mood, thereby obtaining a candidate interaction associated with the specific mood.
This reads on a human:
Wherein the obtaining information (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model) consists of having responses associated a mood, which may be used to update the information.
No additional limitations are present.
With respect to claim 18, the claim(s) recite:
wherein the set of candidate interactions includes a first candidate interaction having a first mood and a second candidate interaction having a second mood that is different from the first mood.
This reads on a human:
Wherein the obtaining information (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model) consists of having responses associated with two different moods or emotions.
No additional limitations are present.
With respect to claim 19, the claim(s) recite:
wherein each generated node further comprises at least one of programmatic output of the model output or an associated emotion for the natural language output of the node.
This reads on a human:
Wherein drawing the table or diagram associated with the selection from the other human is associated with the response according to a predetermined set of rules (i.e., model) or an emotion from the other human.
No additional limitations are present.
With respect to claim 20, the claim(s) recite:
wherein generating the set of candidate interactions comprises:
providing, to a machine learning service, an indication of the agent information; and
receiving, from the machine learning service, the model output.
This reads on a human:
Wherein the obtaining information (e.g., questions or queries on a piece of paper or from another human) and writing down possible responses to the questions/queries associated with a predefined set of rules (i.e., model) consists of
selecting (using a predefined set of rules, i.e., model) information and
receiving a response according to predefined set of rules, i.e., model.
No additional limitations are present.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 1, 4-5, 7, 14-16, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kluwer et al. ("Talking NPCs in a virtual game world." Proceedings of the ACL 2010 System Demonstrations. 2010. https://aclanthology.org/P10-4007.pdf) and further in view of Chang et al. (WO 2021159779 A1) and Olabiyi et al. (US 20200098353 A1).
As to independent claim 1, Kluwer et al. teaches:
1. A system (see ¶ 1 of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […]) comprising:
at least one processor (roughly 1 GB RAM. The triple store is run as a separate see ¶ 5 of 4. Overall System Architecture: “The integration architecture also has the advantage that the necessary services can be easily distributed in a networked multi-platform environment. The Twinity clients require a Microsoft Windows machine with a 3D graphics card supporting DirectX 9.0c or higher, 1 GB RAM and a CPU core per instance. The KomParse server requires server process and is accessed by an XML-RPC interface. Roughly 1.2 GB RAM are required for loading our current knowledge base.”); and
memory storing instructions (see ¶ 5 of 4. Overall System Architecture citation as in limitation, above. “1 GB RAM”) that, when executed by the at least one processor, cause the system to perform a set of operations (see ¶ 5 of 4. Overall System Architecture citation as in limitation, above and further Figure 4: “KomParse Dialog System”), the set of operations comprising:
computing a structured prompt that includes agent information for a non-player character (NPC) of a game environment (see ¶ 1 and 5, and of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […] As mentioned above, our dialog system has to deal with two different scenarios. While the focal point of the bartender agent lies in the question answering functionality, the furniture sales agent is driven by a complex dialog task model based on a dialog graph. Thus, the bartender agent relies mainly on question answering technology, in that it needs to understand questions and extract the right answer from our knowledge bases, whereas the sales agent has to accommodate various dialog situations with respect to a sales scenario.…
and further ¶ 1-2 of 5.2 Information Extraction: “…The input “Do you have any red couches?” for example needs to get processed by the system in such a way that the information regarding the sofa with red color is extracted. This is done by the system in a data-driven way. The input analysis first tries to find a demanded object in the input via asking the ontology: Every object which can be discussed in the scenario is encoded in the sales agents knowledge base. This can be seen as a Named Entity Recognition step. In case of success, the system tries to detect one of the possible relations of the object found in the input. This is achieved by querying the ontology about what kind of relations the identified object can satisfy. Possible relations are encoded in the class description of the given object. As a result the system can detect a relation “hasColour” for the found object “sofa” and the color value “red”. The found information gets inserted into the form which gets more and more similar or if possible equal to a search query via RDF.”);
generating, a set of candidate interactions for the NPC that includes model output (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […] … the sales agent has to accommodate various dialog situations with respect to a sales scenario. It therefore has to understand the dialog acts intended by the user and trigger the corresponding reactions, such as presenting an object, memorizing user preferences, negotiating further sales goals, etc. […] … In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent… [… ] The flow of the task-based conversation is controlled by a data-driven finite-state model, which is the backbone of the dialog manager. During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime. This strategy keeps the finite-state graph as small as possible. Discussed objects and their features are stored in a frame-based sub-component named ”form”. The form contains entries which correspond to ontological concepts in the furniture ontology. During conversation, these entries will be specified with the values of the properties of the discussed objects. This frame-based approach increases the flexibility of the dialog manager (McTear, 2002) and is particularly useful for a task-driven dialog system. As long as the negotiated object is not yet fully specified, the form represents the underspecified object description according to the ontology concept. Every time the user states a new preference or request, the form is enriched with additional features until the set of objects is small enough to be presented to the user for final selection. Thus the actual flow of dialog according to the task model does not have to be expressed by the graph but can be derived on demand from the knowledge and handled by the form which in turn activates the appropriate dialog subgraphs. This combination of graph-based dialog models and form-based task modelling effectively accounts for the interaction of sequential dialog strategies and the non-sequential nature of complex dialog goals.” and Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”);
presenting the set of candidate interactions for user selection (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “NPC.1: I could offer you another small table or a sideboard.”);
receiving a selection of one or more candidate interactions (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: USR.2: Then I’ll take a sideboard thats similar to my shelf.); and
generating, for each selected candidate interaction, a node in a dialogue tree comprising natural language output for the selected candidate interaction representing the dialogue turn for the NPC. (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “¶6: …The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words). The following example shows a typical part of such a conversation: (USR.1: And do we have a little side table for the TV? NPC.1: I could offer you another small table or a sideboard. USR.2: Then I’ll take a sideboard thats similar to my shelf. NPC.2: Let me check if we have something like that.) and ¶ 7: …During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime…”
Here, the Examiner notes that the multiple turns (i.e., 3,171) present in the corpus, and as seen in the example part of a conversation (i.e., including representation of the dialogue turns for the NPC) disclosed in Kluwer et al. read on the limitation of candidate interactions representing dialogue turns for the computer-controlled agent.).
However, Kluwer et al. does not explicitly teach, but Chang et al. does teach:
at least in part by prompting a generative machine learning model with the structured prompt, generating, a set of candidate interactions for the NPC that includes model output associated with the generative machine learning model (see ¶ 2 of page 5: “Taking a turn-based role-playing game as an example, the technical solution provided in this application can be used to train the intelligent skill release of the game AI as an NPC role. In turn-based role-playing games, there are often multiple skills between game users and NPC characters in the game scene, and the skill release of each round is related to the state of each game character in the current round, so NPC characters fight The complexity and difficulty of strategy learning are relatively high, and the cost of learning a network model directly from scratch is very high. The embodiments of the present application are based on the idea of imitating learning, that is, learning the battle strategy by observing behaviors that imitate the skill release of real game users. The embodiment of the application introduces the idea of generating adversarial learning on the basis of imitation learning, so as to train the game AI as an NPC character by using the method of generating adversarial imitation learning. Figure 2 schematically shows a schematic diagram of the principle of generative adversarial imitation learning. As shown in Figure 2, it is assumed that a real game user generates a user game data set by running the game PCTCN2020127092-appb-000001.tif Obey a certain distribution, where user game data PCTCN2020127092-appb-000002.tif Represents the user game state of the real game user corresponding to the game behavior subject (such as the game character controlled by the game user) in the game scene, PCTCN2020127092-appb-000003.tif Represents the user game behavior of the real game user in the face of the corresponding user's game state. In the embodiment of the application, the behavior model Actor continuously interacts with the game scene to generate a model game data set {τ .sub.1 , τ .sub.2 ,...τ .sub.N } that mimics the game behavior of real game users, where τ = {s .sub.1 , a .sub.1 , s .sub.2 , a .sub.2 .Math.s .sub.T , a .sub.T }, s represents the model game state of the game behavior subject (such as an NPC character) corresponding to the behavior model in the game scene, a represents the model made by the behavior model facing the corresponding model game state Game behavior. Through the method of generating adversarial learning, the model game data set can gradually approach the probability distribution of the user game data set. Through continuous learning of the behavior model Actor, the probability distribution of the user game data can finally be learned, so that the skills output by the behavior model release and kill the target Game behaviors such as selection will be closer to the behavior of real game users, and game AI has a higher level of personification and intelligence, thereby improving the efficiency of human-computer interaction.”
and ¶ 8 of page 10: “In the model optimization method for behavior models provided by the embodiments of the present application, the probability distribution of user game samples is learned from the game data of real game users by generating adversarial imitation learning, and the behavior model can be guided to make behavior characteristics close to real game users or Game behavior strategies that meet the expectations of real game users. The training method based on generative adversarial imitation learning can not only reduce the consumption of computing resources in the model training process, but also improve the training efficiency of the model and obtain better training effects.”),
wherein a NPC is generated using the generative machine learning model (see ¶ 2 of page 5 and ¶ 8 of page 10 citations as in limitation above.)
Kluwer et al. and Chang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Chang et al. of at least in part by prompting a generative machine learning model with the structured prompt, generating, a set of candidate interactions for the NPC that includes model output associated with the generative machine learning model, wherein a turn of the NPC is generated using the generative machine learning model which provides the benefit of reducing the consumption of computing resources in the model training process, but also improving the training efficiency of the model and obtain better training effects(¶ 8 of page 10 of Chang).
However, Kluwer et al. in combination with Chang et al. do not explicitly teach, but Olabiyi et al. does teach:
wherein a dialogue turn of the NPC is generated using the generative machine learning model (see ¶ [0017]: “Various embodiments may be generally directed to the use of an adversarial learning framework for persona-based dialogue modeling. In some embodiments, automated multi-turn dialogue response generation may be performed using a persona-based hierarchical recurrent encoder-decoder-based generative adversarial network (phredGAN). Such a phredGAN may feature a persona-based hierarchical recurrent encoder-decoder (PHRED) generator and a conditional discriminator.” and ¶ [0049]: “The use of a phredGAN such as phredGAN 500 for multi-turn response generation based on an adversarially trained dialogue model may address the problem of mode collapse while providing consistent personality traits. Use of the phredGAN may yield benefits in both supervised and unsupervised use cases. In a supervised use case, multi-modal attributes such as speaker name/identity and dialogue subtopic may be available along with dialogue utterances, and the dialogue model output response may be improved by conditioning the response generation on these attributes.…”
and further ¶ [0025-0027 and 0044-0045]: “[0025] FIG. 5 illustrates an example of a persona-based hierarchical recurrent encoder-decoder-based GAN (phredGAN) 500 that embodies such a framework. phredGAN 500 may feature an architecture that is generally representative of an hredGAN architecture modified to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments, and so on. As shown in FIG. 5, phredGAN 500 features a persona-based hierarchical recurrent encoder-decoder (PHRED) generator 525 and a conditional discriminator 526, which respectively serve as generator 305 and discriminator 306 of FIG. 3. [0026] Multi-turn dialogue response generation in phredGAN 500 may be formulated in similar fashion to that in hredGAN 400, but taking speaker and/or utterance attributes into account. Namely, the dialogue history serving as basis for multi-turn dialogue response generation using phredGAN 500 may take the form X.sub.i=((X.sub.1, C.sub.1), (X.sub.2, C.sub.2), . . . (X.sub.i, C.sub.i)) where C.sub.i is additional input that represents the speaker and/or utterance attributes. C.sub.i can either be a sequence of tokens or single token such that C.sub.i.sup.j∈V.sub.c for vocabulary V.sub.c. At the ith turn, C.sub.i and C.sub.i+1 are the source/input attributes, such as speaker's identity, speaker's background, speaker's location, speaker's preference and so on, and target/output attributes, such as responder's identity, responder's background, responder's location, responder's preference and so on, to the generator, respectively. The embedding for attribute tokens is also learned similar to that of word tokens. [0027] In one example, assume there is dialogue data with conversations involving customers of different demographics such as age, location and so on, and service agents of different areas of expertise. When a model, such as hredGAN, is trained on data but does not use persona/attributes as in the example embodiments, the model may only be capable of generating responses by an average agent to an average customer. But with a phredGAN of the example embodiments, trained with persona/attribute information, the dialogue generating model can generate responses that are more appropriate for a specific user group. This inherently increases the response diversity since it is no longer an average response. Below illustrates an example dialogue with two different responses based on utilization of hredGAN vs. an exemplary phredGAN:… [0044] Both the PHRED generator 525 and the conditional discriminator 526 (with shared encoder) of phredGAN 500 may be trained using a training procedure characterized by training algorithm 800 of FIG. 8. Both in embodiments in which phredGAN 500 is a phredGAN.sub.a and in embodiments in which phredGAN 500 is a phredGAN.sub.d, λ.sub.G.sub.adv=1. In the phredGAN.sub.a case, λ.sub.G.sub.att=0, while in the phredGAN.sub.d case, λ.sub.G.sub.att=1. Since the encoder, word embedding and attribute embedding are shared, the system may be trained end-to-end with back-propagation. [0045] In a given embodiment in which PHRED generator 525 and the conditional discriminator 526 are trained using a training procedure characterized by training algorithm 800, each RNN unit of phredGAN 500 may be implemented as a 3-layer gate recurrent unit (GRU) cell with a hidden state size of 512. The encoder RNN (eRNN) may be bidirectional, while the context RNN (cRNN) may be unidirectional. A word vocabulary size V of 50,000 may be used, with a word embedding size of 512. An attribute embedding size of 512 may be used. The number of attributes V.sub.c may be dataset dependent. Only one attribute may be used per utterance so that there is no need to use attention to combine the attribute embeddings. The attention RNN (aRNN) outputs may be connected to the decoder RNN (dRNN) input using an additive attention mechanism.”);
Kluwer et al., Chang and Olabiyi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. in combination with Chang to incorporate the teachings of Olabiyi et al. model output associated with a multimodal machine learning model, wherein a dialogue turn of the computer- controlled agent is generated using the multimodal machine learning model which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
Regarding claim 4, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 1, above.
Kluwer et al. further teaches:
4. The system of claim 1,
wherein the agent information comprises (see ¶ 1 and 5, and of 5. Conversational Agent: KomParse Dialog System citation(s) as in claim 1, above.) at least one of:
background information associated with the game environment (see Figure 1-2 [examples of NPC (agent) selling sofa or bartending in a virtual environment] and further ¶ 2 of 6. Conclusion and Future Work: “KomParse is an ambitious and nevertheless pragmatic attempt to bring NLP into the world of virtual games. We develop a new strategy to integrate task models and domain ontologies into dialog models. This strategy is useful for task-driven NPCs such as furniture sellers. With the chatty bartender, a combination of task-specific dialog and domain-specific question answering enables a smart wide-domain off-task conversation. Since the online game employs bubble-chat as a mode of communication in addition to Voice-over-IP, we are able to test our dialog system in a real-time application without being hindered by imperfect speech recognition.”);
historical information associated with the user (see ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…Every time the user states a new preference or request, the form is enriched with additional features until the set of objects is small enough to be presented to the user for final selection…”); or
game environment state information for the game environment (see Figure 1 [example of NPC (agent) selling sofa in a virtual environment], ¶ 2 of 6. Conclusion and Future Work citation as in limitation above and ¶ 1-2 of 5.2 Information Extraction: “Both scenarios make use of state-of-the-art information extraction approaches to extract the important pieces from the user input. While the bartender depends on relation extraction to detect the fact or relation questioned by the user (Xu et al., 2007), the sales agent uses information extraction methods to recognize user wishes and demands. As a result, the questioned fact or the demanded object feature equals the ontology structure containing the knowledge needed to handle the user input. The input “Do you have any red couches?” for example needs to get processed by the system in such a way that the information regarding the sofa with red color is extracted. This is done by the system in a data-driven way. The input analysis first tries to find a demanded object in the input via asking the ontology: Every object which can be discussed in the scenario is encoded in the sales agents knowledge base. This can be seen as a Named Entity Recognition step. In case of success, the system tries to detect one of the possible relations of the object found in the input. This is achieved by querying the ontology about what kind of relations the identified object can satisfy. Possible relations are encoded in the class description of the given object. As a result the system can detect a relation “hasColour” for the found object “sofa” and the color value “red”. The found information gets inserted into the form which gets more and more similar or if possible equal to a search query via RDF.”).
Regarding claim 5, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 1, above.
Kluwer et al. further teaches:
5. The system of claim 1,
wherein each generated node further comprises at least one of programmatic output of the model output or an associated emotion for the natural language output (Figure 1 [example of NPC (agent) selling sofa in a virtual environment], Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV?, NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf…”and ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and egdes at runtime. This strategy keeps the finite-state graph as small as possible. Discussed objects and their features are stored in a frame-based sub-component named ”form”…”).
Regarding claim 7, Kluwer et al. in combination with Chang and Olabiyi et al. teaches all of the limitations as in claim 1, above.
Kluwer et al. further teaches:
7. The system of claim 1,
wherein generating the set of candidate interactions (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System citations as in claim 1, above and Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”) comprises:
Olabayi et al. further teaches:
providing, to a machine learning service, an indication of the agent information (see ¶ [0017, 0049] citations as in claim 1, above. More specifically: ¶ [0049]: “The use of a phredGAN such as phredGAN 500 for multi-turn response generation based on an adversarially trained dialogue model may address the problem of mode collapse while providing consistent personality traits. Use of the phredGAN may yield benefits in both supervised and unsupervised use cases. In a supervised use case, multi-modal attributes such as speaker name/identity and dialogue subtopic may be available along with dialogue utterances, and the dialogue model output response may be improved by conditioning the response generation on these attributes.…”); and
receiving, from the machine learning service, the model output (see ¶ [0017 and 0049] citations as in limitation above and further ¶ [0025-0027 and 0044-0045]: “[0025] FIG. 5 illustrates an example of a persona-based hierarchical recurrent encoder-decoder-based GAN (phredGAN) 500 that embodies such a framework. phredGAN 500 may feature an architecture that is generally representative of an hredGAN architecture modified to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments, and so on. As shown in FIG. 5, phredGAN 500 features a persona-based hierarchical recurrent encoder-decoder (PHRED) generator 525 and a conditional discriminator 526, which respectively serve as generator 305 and discriminator 306 of FIG. 3. [0026] Multi-turn dialogue response generation in phredGAN 500 may be formulated in similar fashion to that in hredGAN 400, but taking speaker and/or utterance attributes into account. Namely, the dialogue history serving as basis for multi-turn dialogue response generation using phredGAN 500 may take the form X.sub.i=((X.sub.1, C.sub.1), (X.sub.2, C.sub.2), . . . (X.sub.i, C.sub.i)) where C.sub.i is additional input that represents the speaker and/or utterance attributes. C.sub.i can either be a sequence of tokens or single token such that C.sub.i.sup.j∈V.sub.c for vocabulary V.sub.c. At the ith turn, C.sub.i and C.sub.i+1 are the source/input attributes, such as speaker's identity, speaker's background, speaker's location, speaker's preference and so on, and target/output attributes, such as responder's identity, responder's background, responder's location, responder's preference and so on, to the generator, respectively. The embedding for attribute tokens is also learned similar to that of word tokens. [0027] In one example, assume there is dialogue data with conversations involving customers of different demographics such as age, location and so on, and service agents of different areas of expertise. When a model, such as hredGAN, is trained on data but does not use persona/attributes as in the example embodiments, the model may only be capable of generating responses by an average agent to an average customer. But with a phredGAN of the example embodiments, trained with persona/attribute information, the dialogue generating model can generate responses that are more appropriate for a specific user group. This inherently increases the response diversity since it is no longer an average response. Below illustrates an example dialogue with two different responses based on utilization of hredGAN vs. an exemplary phredGAN:… [0044] Both the PHRED generator 525 and the conditional discriminator 526 (with shared encoder) of phredGAN 500 may be trained using a training procedure characterized by training algorithm 800 of FIG. 8. Both in embodiments in which phredGAN 500 is a phredGAN.sub.a and in embodiments in which phredGAN 500 is a phredGAN.sub.d, λ.sub.G.sub.adv=1. In the phredGAN.sub.a case, λ.sub.G.sub.att=0, while in the phredGAN.sub.d case, λ.sub.G.sub.att=1. Since the encoder, word embedding and attribute embedding are shared, the system may be trained end-to-end with back-propagation. [0045] In a given embodiment in which PHRED generator 525 and the conditional discriminator 526 are trained using a training procedure characterized by training algorithm 800, each RNN unit of phredGAN 500 may be implemented as a 3-layer gate recurrent unit (GRU) cell with a hidden state size of 512. The encoder RNN (eRNN) may be bidirectional, while the context RNN (cRNN) may be unidirectional. A word vocabulary size V of 50,000 may be used, with a word embedding size of 512. An attribute embedding size of 512 may be used. The number of attributes V.sub.c may be dataset dependent. Only one attribute may be used per utterance so that there is no need to use attention to combine the attribute embeddings. The attention RNN (aRNN) outputs may be connected to the decoder RNN (dRNN) input using an additive attention mechanism.”);
Kluwer et al. and Olabiyi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Olabiyi et al. of providing, to a machine learning service, an indication of the agent information; and receiving, from the machine learning service, the model output which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
As to independent claim 14, Kluwer et al. teaches:
14. A method (see ¶ 1 of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […]), comprising:
[the limitations as in claim 1, above, taught by Kluwer et al. in combination with Chang et al. and Olabiyi et al.]
Regarding claim 15, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 14, above.
Kluwer et al. further teaches:
15. The method of claim 14, wherein presenting the set of candidate interactions comprises displaying natural language output of candidate interactions (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “NPC.1: I could offer you another small table or a sideboard.”).
Olabiyi et al. further teaches:
a generative machine learning model (see ¶ [0017, 0025-0027, 0044-0045, and 0049] citations as in claims 1, 7-8, 11, and 14 above.);
Kluwer et al. and Olabiyi et al., are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Olabiyi et al. of a multimodal machine learning model which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
Regarding claim 16, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 14, above.
Kluwer et al. further teaches:
16. The method of claim 14, further comprising:
receiving user input that indicates a context for the NPC (see ¶ 1 and 5, and of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […] As mentioned above, our dialog system has to deal with two different scenarios. While the focal point of the bartender agent lies in the question answering functionality, the furniture sales agent is driven by a complex dialog task model based on a dialog graph. Thus, the bartender agent relies mainly on question answering technology, in that it needs to understand questions and extract the right answer from our knowledge bases, whereas the sales agent has to accommodate various dialog situations with respect to a sales scenario.…” and Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”); and
computing the structured prompt based at least in part on the user input (Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”).
Regarding claim 19, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 14, above.
Kluwer et al. further teach:
19. The method of claim 14, wherein each generated node further comprises at least one of programmatic output of the model output or an associated emotion for the natural language output of the node (Figure 1 [example of NPC (agent) selling sofa in a virtual environment], Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV?, NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf…”and ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and egdes at runtime. This strategy keeps the finite-state graph as small as possible. Discussed objects and their features are stored in a frame-based sub-component named ”form”…”).
Regarding claim 20, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 14, above.
Kluwer et al. further teaches:
20. The method of claim 14,
wherein generating the set of candidate interactions (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System citations as in claim 14, above and Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”) comprises:
Olabayi et al. further teaches:
providing, to a machine learning service, an indication of the agent information (see ¶ [0017, 0049] citations as in claim 1 and 11, above. More specifically: ¶ [0049]: “The use of a phredGAN such as phredGAN 500 for multi-turn response generation based on an adversarially trained dialogue model may address the problem of mode collapse while providing consistent personality traits. Use of the phredGAN may yield benefits in both supervised and unsupervised use cases. In a supervised use case, multi-modal attributes such as speaker name/identity and dialogue subtopic may be available along with dialogue utterances, and the dialogue model output response may be improved by conditioning the response generation on these attributes.…”); and
receiving, from the machine learning service, the model output (see ¶ [0017 and 0049] citations as in limitation above and further ¶ [0025-0027 and 0044-0045] also as in claims 1 and 11 above.).
Kluwer et al. and Olabiyi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Olabiyi et al. of providing, to a machine learning service, an indication of the agent information; and receiving, from the machine learning service, the model output which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
Claims 2-3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kluwer et al. ("Talking NPCs in a virtual game world." Proceedings of the ACL 2010 System Demonstrations. 2010. https://aclanthology.org/P10-4007.pdf) and further in view of Chang et al. (WO 2021159779 A1) and Olabiyi et al. (US 20200098353 A1) as in claim 1, above and further in view of Dicken et al. (US 20220088474 A1).
Regarding claim 2, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 1, above.
Kluwer et al. further teaches:
2. The system of claim 1, wherein:
the NPC is a first NPC (see ¶ 1 and 5 of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […] As mentioned above, our dialog system has to deal with two different scenarios. While the focal point of the bartender agent lies in the question answering functionality, the furniture sales agent is driven by a complex dialog task model based on a dialog graph. Thus, the bartender agent relies mainly on question answering technology, in that it needs to understand questions and extract the right answer from our knowledge bases, whereas the sales agent has to accommodate various dialog situations with respect to a sales scenario.…);
the structured prompt is a first structured prompt (see ¶ 1 and 5 of 5. Conversational Agent: KomParse Dialog System citations as in limitation above and further Table 1: “NPC.1: I could offer you another small table or a sideboard.”);
the agent information is a first instance of agent information (see ¶ 1 and 5 of 5. Conversational Agent: KomParse Dialog System citations as in limitation above and further ¶ 6 of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words).”);
the set of candidate interactions is a first set of candidate interactions (see ¶ 1 and 5, of 5. Conversational Agent: KomParse Dialog System citations as in limitation above.);
the generated nodes are included in a set of generated nodes for the first NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime…”);
and the set of operations (see ¶ 1 and 5, of 5. Conversational Agent: KomParse Dialog System citations as in limitation above.) further comprises:
computing a second structured prompt including a second instance of agent information for a second NPC of the game environment (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words).” Table 1: “NPC.2: Let me check if we have something like that.”
Here, there is implicitly second/third/or more user-sales agent interactions allowing the user to select/define user preferences regarding furniture.);
generating, a second set of candidate interactions for the second NPC that includes model output (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words).” Table 1: “NPC.1: I could offer you another small table or a sideboard.” … “NPC.2: Let me check if we have something like that.”
Here, there is implicitly second/third/or more user-sales agent interactions allowing the user to select/define user preferences regarding furniture.)
presenting the second set of candidate interactions for user selection selection (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations and notes as in limitation(s) above.);
receiving a second selection of one or more candidate interactions (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations and notes as in limitation(s) above.); and
generating, for each selected candidate interaction for the second NPC, a node in the dialogue tree comprising natural language output for the selected candidate interaction (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime…”),
Chang et al. further teaches:
at least in part by prompting the generative machine learning model with the second structured prompt, generating, a second set of candidate interactions for the second NPC (see ¶ 2 of page 5 and ¶ 8 of page 10 citations as in claim 1, above.)
Kluwer et al. and Chang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Chang et al. at least in part by prompting the generative machine learning model with the second structured prompt, generating, a second set of candidate interactions for the second NPC which provides the benefit of reducing the consumption of computing resources in the model training process, but also improving the training efficiency of the model and obtain better training effects(¶ 8 of page 10 of Chang).
Olabiyi et al. further teaches:
model output associated with the generative machine learning model (see ¶ [0017, 0025-0027, 0044-0045, and 0049] citations as in claims 1 and 7 above.);
Kluwer et al. and Olabiyi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Olabiyi et al. of include[ing] model output associated with the multimodal machine learning model which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
However, Kluwer et al in combination with Chang and Olabiyi et al. do not explicitly teach, but Dicken et al. does teach:
wherein the generated node is associated with a node of the generated set of nodes for the first NPC (see Fig. 21-22 and ¶ [0202]: “Turning now to FIG. 21, therein is shown a flow diagram 2100 that schematically illustrates an example method for modeling traversal behavior and generating custom content based on analysis of the modeled behavior. First, the way a user moves around the app and the order that they visit different aspects (i.e., their game traversal behavior) is modeled as a directed graph (i.e., a data structure comprising nodes connected by directional edges), thus building a graph database 2128 (FIG. 21). In this example, each session is modeled by building, at operation 2102, a respective traversal graph. In FIG. 22, example traversal graph 2200 schematically illustrates the graph structure employed in this example for representing a single respective game interaction session, or journey. Note that the graph structure in this example for the session traversal graph 2200 includes a player node connected by a respective edge to a session node, which is in turn connected by respective edges to a successive series of step nodes. Each step node in turn connects to a single respective action node 2202 selected from a predefined set of traversal or journey actions that are selected for tracking and/or inferring one or more psychological features or states of users.”); and
storing the node in the game agent data store.
Kluwer et al., Chang and Olabiyi et al. and Dicken et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in machine-user interaction associated with ontologies/graph databases. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Dicken et al. in combination with Chang and Olabiyi et al. of wherein the generated node is associated with a node of the generated set of nodes for the first computer-controlled agent which provides the benefit of enabling the player modeling engine 120 to traverse the player's journeys or any sub-journeys more efficiently than in a traditional relational database ([0203] of Dicken et al.).
Regarding claim 3, Kluwer et al. in combination with Chang and Olabiyi et al. and Dicken et al. teach all of the limitations as in claim 2, above.
Kluwer et al. further teaches:
3. The system of claim 2,
wherein the second instance of agent information includes an indication of a selected candidate interaction for the first NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System citation(s) as in claims 1-2, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances 5.2 Information Extraction and 23,015 alpha-numerical strings (words).”
Here, there is implicitly second/third/or more user-sales agent interactions allowing the user to select/define user preferences regarding furniture
and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in claims 1-2 above, further: “NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf…”.).
Claim 6 and 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kluwer et al. ("Talking NPCs in a virtual game world." Proceedings of the ACL 2010 System Demonstrations. 2010. https://aclanthology.org/P10-4007.pdf) and further in view of Chang et al. (WO 2021159779 A1) and Olabiyi et al. (US 20200098353 A1) as in claims 1 and 14, above and further in view of Zarlengo et al. (US 20210374863 A1).
Regarding claim 6, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 1, above.
However, Kluwer et al. in combination with Chang and Olabiyi et al. do not explicitly teach, but Zarlengo et al. does teach:
6. The system of claim 1,
wherein the set of candidate interactions includes a first candidate interaction having a first mood and a second candidate interaction having a second mood that is different from the first mood (see ¶ [0133]: “In some implementations and referring also to FIG. 8, prompting 204 the user, via the interactive virtual assistant, with one or more options may include prompting 214 the user to rate one or more financial transactions. For example, one or more transactions available in a user's transaction history may be provided by interactive virtual assistant process 10, via the interactive virtual assistant (e.g., interactive virtual assistant 800), for rating with an emotional spend annotation. In some implementations, a user may be prompted 214 by the interactive virtual assistant to reflect upon one or more recent transactions or purchases (e.g., prompt 802) and if the purchase was deemed satisfying (i.e., does the transaction make the user more happy), select a first emotional annotation (e.g., selection of a green smiling face button 804). If a given purchase leads to feelings of remorse or regret (i.e., does the transaction make the user more sad), a selection of a second emotional annotation may be received (e.g., selection of a red sad face button 806).”).
Kluwer et al., Olabiyi et al. and Zarlengo et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. in combination with Chang and Olabiyi et al. to incorporate the teachings of Zarlengo et al. of wherein the set of candidate interactions includes a first candidate interaction having a first mood and a second candidate interaction having a second mood that is different from the first mood which provides the benefit of improve financial and structured media product technology processes ([0042] of Zarlengo et al.).
Regarding claim 17, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 14, above.
However, Kluwer et al. in combination with Chang and Olabiyi et al. and do not explicitly teach, but Zarlengo et al. does teach:
17. The method of claim 14, wherein generating the set of candidate interactions comprises altering the agent information for a specific mood, thereby obtaining a candidate interaction associated with the specific mood (see ¶ [0133]: “In some implementations and referring also to FIG. 8, prompting 204 the user, via the interactive virtual assistant, with one or more options may include prompting 214 the user to rate one or more financial transactions. For example, one or more transactions available in a user's transaction history may be provided by interactive virtual assistant process 10, via the interactive virtual assistant (e.g., interactive virtual assistant 800), for rating with an emotional spend annotation. In some implementations, a user may be prompted 214 by the interactive virtual assistant to reflect upon one or more recent transactions or purchases (e.g., prompt 802) and if the purchase was deemed satisfying (i.e., does the transaction make the user more happy), select a first emotional annotation (e.g., selection of a green smiling face button 804). If a given purchase leads to feelings of remorse or regret (i.e., does the transaction make the user more sad), a selection of a second emotional annotation may be received (e.g., selection of a red sad face button 806).”).
Kluwer et al., Chang and Olabiyi et al. and Zarlengo et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. in combination with Chang and Olabiyi et al. to incorporate the teachings of Zarlengo et al. of wherein generating the set of candidate interactions comprises altering the agent information for a specific mood, thereby obtaining a candidate interaction associated with the specific mood which provides the benefit of improve financial and structured media product technology processes ([0042] of Zarlengo et al.).
Regarding claim 18, Kluwer et al. in combination with Chang and Olabiyi et al. teach all of the limitations as in claim 17, above.
However, Kluwer et al. in combination with Chang and Olabiyi et al. do not explicitly teach, but Zarlengo et al. does teach:
18. The method of claim 17, wherein the set of candidate interactions includes a first candidate interaction having a first mood and a second candidate interaction having a second mood that is different from the first mood (see ¶ [0133]: “In some implementations and referring also to FIG. 8, prompting 204 the user, via the interactive virtual assistant, with one or more options may include prompting 214 the user to rate one or more financial transactions. For example, one or more transactions available in a user's transaction history may be provided by interactive virtual assistant process 10, via the interactive virtual assistant (e.g., interactive virtual assistant 800), for rating with an emotional spend annotation. In some implementations, a user may be prompted 214 by the interactive virtual assistant to reflect upon one or more recent transactions or purchases (e.g., prompt 802) and if the purchase was deemed satisfying (i.e., does the transaction make the user more happy), select a first emotional annotation (e.g., selection of a green smiling face button 804). If a given purchase leads to feelings of remorse or regret (i.e., does the transaction make the user more sad), a selection of a second emotional annotation may be received (e.g., selection of a red sad face button 806).”).
Kluwer et al., Chang and Olabiyi et al. and Zarlengo et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. in combination with Chang and Olabiyi et al. to incorporate the teachings of Zarlengo et al. of wherein the set of candidate interactions includes a first candidate interaction having a first mood and a second candidate interaction having a second mood that is different from the first mood which provides the benefit of improve financial and structured media product technology processes ([0042] of Zarlengo et al.).
Claims 8-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kluwer et al. ("Talking NPCs in a virtual game world." Proceedings of the ACL 2010 System Demonstrations. 2010. https://aclanthology.org/P10-4007.pdf) and further in view of Chang et al. (WO 2021159779 A1) and Olabiyi et al. (US 20200098353 A1) and Dicken et al. (US 20220088474 A1).
As to independent claim 8, Kluwer et al. teaches:
8. A method (see ¶ 1 of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […]), comprising:
computing a first structured prompt that includes a first instance of agent information for a first non-player character (NPC) of a game environment see ¶ 1 and 5, and of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […] As mentioned above, our dialog system has to deal with two different scenarios. While the focal point of the bartender agent lies in the question answering functionality, the furniture sales agent is driven by a complex dialog task model based on a dialog graph. Thus, the bartender agent relies mainly on question answering technology, in that it needs to understand questions and extract the right answer from our knowledge bases, whereas the sales agent has to accommodate various dialog situations with respect to a sales scenario.…
and further ¶ 1-2 of 5.2 Information Extraction: “…The input “Do you have any red couches?” for example needs to get processed by the system in such a way that the information regarding the sofa with red color is extracted. This is done by the system in a data-driven way. The input analysis first tries to find a demanded object in the input via asking the ontology: Every object which can be discussed in the scenario is encoded in the sales agents knowledge base. This can be seen as a Named Entity Recognition step. In case of success, the system tries to detect one of the possible relations of the object found in the input. This is achieved by querying the ontology about what kind of relations the identified object can satisfy. Possible relations are encoded in the class description of the given object. As a result the system can detect a relation “hasColour” for the found object “sofa” and the color value “red”. The found information gets inserted into the form which gets more and more similar or if possible equal to a search query via RDF.”);
generating, based on a first instance of agent information, a first set of candidate interactions for the first NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System: “The KomParse dialog system, the main functionality of the conversational agent, consists of the following three major components: input analyzer, dialog manager and output generator (fig.4) […] … the sales agent has to accommodate various dialog situations with respect to a sales scenario. It therefore has to understand the dialog acts intended by the user and trigger the corresponding reactions, such as presenting an object, memorizing user preferences, negotiating further sales goals, etc. […] … In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent… [… ] The flow of the task-based conversation is controlled by a data-driven finite-state model, which is the backbone of the dialog manager. During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime. This strategy keeps the finite-state graph as small as possible. Discussed objects and their features are stored in a frame-based sub-component named ”form”. The form contains entries which correspond to ontological concepts in the furniture ontology. During conversation, these entries will be specified with the values of the properties of the discussed objects. This frame-based approach increases the flexibility of the dialog manager (McTear, 2002) and is particularly useful for a task-driven dialog system. As long as the negotiated object is not yet fully specified, the form represents the underspecified object description according to the ontology concept. Every time the user states a new preference or request, the form is enriched with additional features until the set of objects is small enough to be presented to the user for final selection. Thus the actual flow of dialog according to the task model does not have to be expressed by the graph but can be derived on demand from the knowledge and handled by the form which in turn activates the appropriate dialog subgraphs. This combination of graph-based dialog models and form-based task modelling effectively accounts for the interaction of sequential dialog strategies and the non-sequential nature of complex dialog goals.” and Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”);
presenting the first set of candidate interactions for user selection (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “NPC.1: I could offer you another small table or a sideboard.”);
receiving a first selection of one or more of the first set of candidate interactions representing the dialogue turn for the first NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “¶6: …The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words). The following example shows a typical part of such a conversation: (USR.1: And do we have a little side table for the TV? NPC.1: I could offer you another small table or a sideboard. USR.2: Then I’ll take a sideboard thats similar to my shelf. NPC.2: Let me check if we have something like that.) and ¶ 7: …During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime…”
Here, the Examiner notes that the multiple turns (i.e., 3,171) present in the corpus, and as seen in the example part of a conversation (i.e., including representation of the dialogue turns for the NPC) disclosed in Kluwer et al. read on the limitation of candidate interactions representing dialogue turns for the computer-controlled agent.).
generating, for each first selected candidate interaction, a node in a dialogue tree for the first NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime…”);
computing a second structured prompt that includes a second instance of agent information for a second NPC of the game environment (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words).” Table 1: “NPC.2: Let me check if we have something like that.”
Here, there is implicitly second/third/or more user-sales agent interactions allowing the user to select/define user preferences regarding furniture.);
generating, based on the second instance of agent information, a second set of candidate interactions for the second NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances and 23,015 alpha-numerical strings (words).”
Here, there is implicitly second/third/or more user-sales agent interactions allowing the user to select/define user preferences regarding furniture.);
presenting the second set of candidate interactions for user selection (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations and notes as in limitation(s) above.);
receiving a second selection of one or more of the second set of candidate interactions representing the dialogue turn for the second NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations and notes as in limitation(s) above.);
generating, for each second selected candidate interaction, a node in the dialogue tree for the second NPC (s see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in limitation above. More specifically: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and [edges] at runtime…”),
However, Kluwer et al. does not explicitly teach, but Chang does teach:
wherein a NPC is generated using a generative machine learning model at least in part by prompting the generative machine learning model with the first structured prompt; (see ¶ 2 of page 5 and ¶ 8 of page 10 citations as in limitation above.)
Kluwer et al. and Chang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Chang et al. of wherein a dialogue turn of the first/second NPC is generated using a generative machine learning model at least in part by prompting the generative machine learning model with the first structured prompt which provides the benefit of reducing the consumption of computing resources in the model training process, but also improving the training efficiency of the model and obtain better training effects(¶ 8 of page 10 of Chang).
However, Kluwer et al. in combination with Chang do not explicitly teach, but Olabiyi et al. does teach:
wherein a dialogue turn of the first/second NPC is generated using a generative machine learning model (see ¶ [0017]: “Various embodiments may be generally directed to the use of an adversarial learning framework for persona-based dialogue modeling. In some embodiments, automated multi-turn dialogue response generation may be performed using a persona-based hierarchical recurrent encoder-decoder-based generative adversarial network (phredGAN). Such a phredGAN may feature a persona-based hierarchical recurrent encoder-decoder (PHRED) generator and a conditional discriminator.” and ¶ [0049]: “The use of a phredGAN such as phredGAN 500 for multi-turn response generation based on an adversarially trained dialogue model may address the problem of mode collapse while providing consistent personality traits. Use of the phredGAN may yield benefits in both supervised and unsupervised use cases. In a supervised use case, multi-modal attributes such as speaker name/identity and dialogue subtopic may be available along with dialogue utterances, and the dialogue model output response may be improved by conditioning the response generation on these attributes.…” and ¶ [0025-0027 and 0044-0045]: “[0025] FIG. 5 illustrates an example of a persona-based hierarchical recurrent encoder-decoder-based GAN (phredGAN) 500 that embodies such a framework. phredGAN 500 may feature an architecture that is generally representative of an hredGAN architecture modified to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments, and so on. As shown in FIG. 5, phredGAN 500 features a persona-based hierarchical recurrent encoder-decoder (PHRED) generator 525 and a conditional discriminator 526, which respectively serve as generator 305 and discriminator 306 of FIG. 3. [0026] Multi-turn dialogue response generation in phredGAN 500 may be formulated in similar fashion to that in hredGAN 400, but taking speaker and/or utterance attributes into account. Namely, the dialogue history serving as basis for multi-turn dialogue response generation using phredGAN 500 may take the form X.sub.i=((X.sub.1, C.sub.1), (X.sub.2, C.sub.2), . . . (X.sub.i, C.sub.i)) where C.sub.i is additional input that represents the speaker and/or utterance attributes. C.sub.i can either be a sequence of tokens or single token such that C.sub.i.sup.j∈V.sub.c for vocabulary V.sub.c. At the ith turn, C.sub.i and C.sub.i+1 are the source/input attributes, such as speaker's identity, speaker's background, speaker's location, speaker's preference and so on, and target/output attributes, such as responder's identity, responder's background, responder's location, responder's preference and so on, to the generator, respectively. The embedding for attribute tokens is also learned similar to that of word tokens. [0027] In one example, assume there is dialogue data with conversations involving customers of different demographics such as age, location and so on, and service agents of different areas of expertise. When a model, such as hredGAN, is trained on data but does not use persona/attributes as in the example embodiments, the model may only be capable of generating responses by an average agent to an average customer. But with a phredGAN of the example embodiments, trained with persona/attribute information, the dialogue generating model can generate responses that are more appropriate for a specific user group. This inherently increases the response diversity since it is no longer an average response. Below illustrates an example dialogue with two different responses based on utilization of hredGAN vs. an exemplary phredGAN:… [0044] Both the PHRED generator 525 and the conditional discriminator 526 (with shared encoder) of phredGAN 500 may be trained using a training procedure characterized by training algorithm 800 of FIG. 8. Both in embodiments in which phredGAN 500 is a phredGAN.sub.a and in embodiments in which phredGAN 500 is a phredGAN.sub.d, λ.sub.G.sub.adv=1. In the phredGAN.sub.a case, λ.sub.G.sub.att=0, while in the phredGAN.sub.d case, λ.sub.G.sub.att=1. Since the encoder, word embedding and attribute embedding are shared, the system may be trained end-to-end with back-propagation. [0045] In a given embodiment in which PHRED generator 525 and the conditional discriminator 526 are trained using a training procedure characterized by training algorithm 800, each RNN unit of phredGAN 500 may be implemented as a 3-layer gate recurrent unit (GRU) cell with a hidden state size of 512. The encoder RNN (eRNN) may be bidirectional, while the context RNN (cRNN) may be unidirectional. A word vocabulary size V of 50,000 may be used, with a word embedding size of 512. An attribute embedding size of 512 may be used. The number of attributes V.sub.c may be dataset dependent. Only one attribute may be used per utterance so that there is no need to use attention to combine the attribute embeddings. The attention RNN (aRNN) outputs may be connected to the decoder RNN (dRNN) input using an additive attention mechanism.”);
Kluwer et al., Chang, and Olabiyi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. in combination with Chang to incorporate the teachings of Olabiyi et al. model output associated with a multimodal machine learning model, wherein a dialogue turn of the computer- controlled agent is generated using the multimodal machine learning model which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
However, Kluwer et al. in combination with Chang and Olabiyi et al. do not explicitly teach, but Dicken et al. does teach:
wherein the generated node is a child node of a node for the first NPC (see Fig. 21-22 and ¶ [0202]: “Turning now to FIG. 21, therein is shown a flow diagram 2100 that schematically illustrates an example method for modeling traversal behavior and generating custom content based on analysis of the modeled behavior. First, the way a user moves around the app and the order that they visit different aspects (i.e., their game traversal behavior) is modeled as a directed graph (i.e., a data structure comprising nodes connected by directional edges), thus building a graph database 2128 (FIG. 21). In this example, each session is modeled by building, at operation 2102, a respective traversal graph. In FIG. 22, example traversal graph 2200 schematically illustrates the graph structure employed in this example for representing a single respective game interaction session, or journey. Note that the graph structure in this example for the session traversal graph 2200 includes a player node connected by a respective edge to a session node, which is in turn connected by respective edges to a successive series of step nodes. Each step node in turn connects to a single respective action node 2202 selected from a predefined set of traversal or journey actions that are selected for tracking and/or inferring one or more psychological features or states of users.”); and
storing the dialogue tree in a game agent data store for the game environment (see Fig. 21-22 and ¶ [0202] citations as in limitation above.).
Kluwer et al. and Dicken et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in machine-user interaction associated with ontologies/graph databases. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. to incorporate the teachings of Dicken et al. of wherein the generated node is a child node of a node for the first computer-controlled agent and storing the dialogue tree in a game agent data store for the virtual environment which provides the benefit of enabling the player modeling engine 120 to traverse the player's journeys or any sub-journeys more efficiently than in a traditional relational database ([0203] of Dicken et al.).
Regarding claim 9, Kluwer et al. in combination with Chang and Olabiyi et al. and Dicken et al. teach all of the limitations as in claim 8, above.
Kluwer et al. further teaches:
9. The method of claim 8, wherein the second instance of agent information is based on the first instance of agent information and includes an indication of a first selected candidate interaction for the first NPC (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System citation(s) as in claims 1-2, further: “In these experiments, 18 users spent one hour each on furnishing a virtual living room in a Twinity apartment by talking to a human wizard controlling the virtual sales agent. The final corpus consists of 18 dialogs containing 3,171 turns with 4,313 utterances 5.2 Information Extraction and 23,015 alpha-numerical strings (words).”
Here, there is implicitly second/third/or more user-sales agent interactions allowing the user to select/define user preferences regarding furniture
and Table 1: Example Conversation from the Wizard-of-Oz Experiment citations as in claims 1-2 above, further: “USR.1: And do we have a little side table for the TV?, NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf…”.).
Regarding claim 10, Kluwer et al. in combination with Chang and Olabiyi et al. and Dicken et al. teach all of the limitations as in claim 8, above.
Kluwer et al. further teaches:
10. The method of claim 8, wherein the agent information comprises at least one of:
background information associated with the game environment (see Figure 1-2 [examples of NPC (agent) selling sofa or bartending in a virtual environment]);
historical information associated with the user (see ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…Every time the user states a new preference or request, the form is enriched with additional features until the set of objects is small enough to be presented to the user for final selection…”); or
game environment state information for the game environment (see Figure 1 [example of NPC (agent) selling sofa in a virtual environment] and ¶ 1-2 of 5.2 Information Extraction: “Both scenarios make use of state-of-the-art information extraction approaches to extract the important pieces from the user input. While the bartender depends on relation extraction to detect the fact or relation questioned by the user (Xu et al., 2007), the sales agent uses information extraction methods to recognize user wishes and demands. As a result, the questioned fact or the demanded object feature equals the ontology structure containing the knowledge needed to handle the user input. The input “Do you have any red couches?” for example needs to get processed by the system in such a way that the information regarding the sofa with red color is extracted. This is done by the system in a data-driven way. The input analysis first tries to find a demanded object in the input via asking the ontology: Every object which can be discussed in the scenario is encoded in the sales agents knowledge base. This can be seen as a Named Entity Recognition step. In case of success, the system tries to detect one of the possible relations of the object found in the input. This is achieved by querying the ontology about what kind of relations the identified object can satisfy. Possible relations are encoded in the class description of the given object. As a result the system can detect a relation “hasColour” for the found object “sofa” and the color value “red”. The found information gets inserted into the form which gets more and more similar or if possible equal to a search query via RDF.”).
Regarding claim 11, Kluwer et al. in combination with Chang and Olabiyi et al. and Dicken et al. teach all of the limitations as in claim 8, above.
Kluwer et al. further teaches:
11. The method of claim 8, wherein generating the first set of candidate interactions (see ¶ 1, 5-7, and of 5. Conversational Agent: KomParse Dialog System citations as in claim 14, above and Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV? | NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf. | NPC.2: Let me check if we have something like that.”) comprises:
Olabayi et al. further teaches:
providing, to a machine learning service, an indication of the agent information (see ¶ [0017, 0049] citations as in claim 1, above. More specifically: ¶ [0049]: “The use of a phredGAN such as phredGAN 500 for multi-turn response generation based on an adversarially trained dialogue model may address the problem of mode collapse while providing consistent personality traits. Use of the phredGAN may yield benefits in both supervised and unsupervised use cases. In a supervised use case, multi-modal attributes such as speaker name/identity and dialogue subtopic may be available along with dialogue utterances, and the dialogue model output response may be improved by conditioning the response generation on these attributes.…”); and
receiving, from the machine learning service, the model output comprising the first set of candidate interactions (see ¶ [0017 and 0049] citations as in limitation above and further ¶ [0025-0027 and 0044-0045]: “[0025] FIG. 5 illustrates an example of a persona-based hierarchical recurrent encoder-decoder-based GAN (phredGAN) 500 that embodies such a framework. phredGAN 500 may feature an architecture that is generally representative of an hredGAN architecture modified to simultaneously capture utterance attributes such as speaker identity, dialogue topic, speaker sentiments, and so on. As shown in FIG. 5, phredGAN 500 features a persona-based hierarchical recurrent encoder-decoder (PHRED) generator 525 and a conditional discriminator 526, which respectively serve as generator 305 and discriminator 306 of FIG. 3. [0026] Multi-turn dialogue response generation in phredGAN 500 may be formulated in similar fashion to that in hredGAN 400, but taking speaker and/or utterance attributes into account. Namely, the dialogue history serving as basis for multi-turn dialogue response generation using phredGAN 500 may take the form X.sub.i=((X.sub.1, C.sub.1), (X.sub.2, C.sub.2), . . . (X.sub.i, C.sub.i)) … [0027] … But with a phredGAN of the example embodiments, trained with persona/attribute information, the dialogue generating model can generate responses that are more appropriate for a specific user group. This inherently increases the response diversity since it is no longer an average response. Below illustrates an example dialogue with two different responses based on utilization of hredGAN vs. an exemplary phredGAN:… [0044] Both the PHRED generator 525 and the conditional discriminator 526 (with shared encoder) of phredGAN 500 may be trained using a training procedure characterized by training algorithm 800 of FIG. 8. Both in embodiments in which phredGAN 500 is a phredGAN.sub.a and in embodiments in which phredGAN 500 is a phredGAN.sub.d, λ.sub.G.sub.adv=1. In the phredGAN.sub.a case, λ.sub.G.sub.att=0, while in the phredGAN.sub.d case, λ.sub.G.sub.att=1. Since the encoder, word embedding and attribute embedding are shared, the system may be trained end-to-end with back-propagation. [0045] In a given embodiment in which PHRED generator 525 and the conditional discriminator 526 are trained using a training procedure characterized by training algorithm 800, each RNN unit of phredGAN 500 may be implemented as a 3-layer gate recurrent unit (GRU) cell with a hidden state size of 512. The encoder RNN (eRNN) may be bidirectional, while the context RNN (cRNN) may be unidirectional. A word vocabulary size V of 50,000 may be used, with a word embedding size of 512...”);
Kluwer et al. Chang, and Olabiyi et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in user/agent communications. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Kluwer et al. in combination with Chang to incorporate the teachings of Olabiyi et al. of providing, to a machine learning service, an indication of the agent information; and receiving, from the machine learning service, the model output which provides the benefit of generating responses that are more appropriate for a specific user group ([0027] of Olabiyi et al.).
Regarding claim 12, Kluwer et al. in combination with Chang and Olabiyi et al. and Dicken et al. teach all of the limitations as in claim 11, above.
Kluwer et al. further teaches:
12. The method of claim 11, wherein each generated node for the first NPC (see ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and egdes at runtime. This strategy keeps the finite-state graph as small as possible. Discussed objects and their features are stored in a frame-based sub-component named ”form”…”) comprises one or more of:
natural language output of the model output;
programmatic output of the model output;
or
(Figure 1 [example of NPC (agent) selling sofa in a virtual environment], Table 1: Example Conversation from the Wizard-of-Oz Experiment: “USR.1: And do we have a little side table for the TV?, NPC.1: I could offer you another small table or a sideboard. | USR.2: Then I’ll take a sideboard thats similar to my shelf…”and ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…During a sales conversation, objects and features of objects mentioned by the NPC and the user are extracted from the knowledge bases and added into the underspecified graph nodes and egdes at runtime. This strategy keeps the finite-state graph as small as possible. Discussed objects and their features are stored in a frame-based sub-component named ”form”…”).
Regarding claim 13, Kluwer et al. in combination with Chang and Olabiyi et al. and Dicken et al.teach all of the limitations as in claim 8, above.
Kluwer et al. f urther teaches:
13. The method of claim 8, wherein the first instance of agent information comprises at least one of:
background information associated with the game environment (see Figure 1-2 [examples of NPC (agent) selling sofa or bartending in a virtual environment]);
historical information associated with the user (see ¶ 7 of 5. Conversational Agent: KomParse Dialog System: “…Every time the user states a new preference or request, the form is enriched with additional features until the set of objects is small enough to be presented to the user for final selection…”); or
game environment state information for the game environment (see Figure 1 [example of NPC (agent) selling sofa in a virtual environment] and ¶ 1-2 of 5.2 Information Extraction: “Both scenarios make use of state-of-the-art information extraction approaches to extract the important pieces from the user input. While the bartender depends on relation extraction to detect the fact or relation questioned by the user (Xu et al., 2007), the sales agent uses information extraction methods to recognize user wishes and demands. As a result, the questioned fact or the demanded object feature equals the ontology structure containing the knowledge needed to handle the user input. The input “Do you have any red couches?” for example needs to get processed by the system in such a way that the information regarding the sofa with red color is extracted. This is done by the system in a data-driven way. The input analysis first tries to find a demanded object in the input via asking the ontology: Every object which can be discussed in the scenario is encoded in the sales agents knowledge base. This can be seen as a Named Entity Recognition step. In case of success, the system tries to detect one of the possible relations of the object found in the input. This is achieved by querying the ontology about what kind of relations the identified object can satisfy. Possible relations are encoded in the class description of the given object. As a result the system can detect a relation “hasColour” for the found object “sofa” and the color value “red”. The found information gets inserted into the form which gets more and more similar or if possible equal to a search query via RDF.”).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Keisha Y. Castillo-Torres
Examiner
Art Unit 2659
/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659