DETAILED ACTION
This action is responsive to the amended claims filed on 11/19/2025. Claims 1, 3-12, and 14-20 are pending for examination.
Response to Amendments/Remarks
Applicant’s arguments with respect to 35 U.S.C. 103 rejection of the claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
In response to the applicant’s argument that amended claim 7 requires that “the one or more rules further include a rule that sets a noisy pseudo-label responsive to a determination that the intervention action is unsuccessful.” (Remarks, pages 11-13). The examiner respectfully disagrees. Xie expressly teaches noisy lexemes i.e., learned/propagated mappings where a word/phrase is “propagated to wrong semantic types,” which Xie links to parsing failure modes (“testing examples with … error parse” were attributable to “noise lexemes,” including words/phrases “propagated to wrong semantic types” Xie, page 181, col. 1, paragraph 3). Xie further teaches that such noise lexemes yield parses that are not correct (they “contributed to the proportion of sentences with parse but did not to the proportion of sentences with correct parse” Xie, page 181, col. 1, paragraph 2). It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention implementing rule-based pseudo-label generation to add an additional rule within the rule set to assign a “noisy” pseudo-label for unsuccessful interventions, consistent with Xie’s treatment of wrong semantic-type mappings as noise correlated with error/incorrect parses.
Applicant’s arguments with respect to the 35 U.S.C. 101 rejections have been fully considered but are not persuasive. As amended, the claims still recite a judicial exception under Step 2A, Prong One because they require performing actions and inverse actions in a text environment and evaluating that information by applying logical rules to determine truth-valued pseudo-labels for propositions (i.e., a rules-based classification/inference that can be practically performed in the human mind and therefore falls within the “mental processes” grouping), and then using the resulting information to train a model, which amounts to applying the abstract idea on a generic computer. See MPEP §2106 for the Step 2A/Step 2B framework. Applicant’s reliance on Example 39 is unpersuasive here because Example 39’s claim was found eligible specifically where it did not recite a judicial exception (e.g., it did not recite mathematical relationships and was not practically performed in the human mind), whereas the present claims expressly recite determining pseudo-labels for logical propositions based on rules. Accordingly, the rejection is maintained because the additional elements (text-based environment execution, recording, and model training) are recited at a high level of generality and do not add significantly more than the judicial exception, i.e., they do not amount to an inventive concept or a specific asserted improvement in computer functionality, but instead use generic computing to implement the abstract rules-based labeling/training concept.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. In determining whether the claims are subject matter eligible, the examiner applies the guidance under MPEP 2106.
Step 1, Statutory Category?
Claims 1-10 are directed to a method.
Claim 11 is directed to a Computer-Readable Medium.
Claims 12-20 are directed to a system.
Claim 1
Step 2A Prong 1: The claim recites the following limitations:
performing an… intervention action in a text-based environment (performing actions in a text based environment as stated in paragraph 59 of the specification is being considered a mental process of evaluation which can reasonably be performed in the human mind and/or with aid of pen and paper.);
performing an inverse action in the text-based environment to reverse the intervention action (performing inverse actions in a text based environment as stated in paragraph 46-48 of the specification is being considered a mental process of evaluation which can be reasonably performed in human mind or with the aid of pen and paper);
evaluating the recorded states to generate training data (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper);
including a determination of pseudo-labels for logical propositions that encode an outcome of one or more rules including a rule where, if a proposition is listed as an effect of the intervention action but is not listed as a precondition of the intervention action, then the pseudo-label indicates that the proposition was false in an initial state before the intervention action; (mental process of determination which can be reasonably performed in human mind or with the aid of pen and paper);
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites the additional limitation:
performing an automated intervention action (performing an action automatically amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
recording states of the text-based environment before and after the intervention action and the inverse action (this limitation merely constitutes data gathering which is considered insignificant extra-solution activity under MPEP 2106.05(g));
and training a semantic parser neural network model using the training data (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
The claim as a whole, looking at the additional elements individually and in combination, does not integrate the judicial exception into a practical application. Therefore, the claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions. The additional element of:
performing an automated intervention action (performing an action automatically amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
recording states of the text-based environment before and after the intervention action and the inverse action (this limitation merely constitutes data gathering which is considered insignificant extra-solution activity under MPEP 2106.05(g), furthermore the courts have recognized Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; as a well-understood, routine, and conventional activity.);
and training a semantic parser neural network model using the training data (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
does not amount significantly more than the abstract idea. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 3 incorporates the rejection of claim 2.
Step 2A Prong 1: The judicial exceptions of claim 2 are incorporated. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites additional elements:
wherein training the semantic parser neural network model includes supervised learning using the pseudo-labels (training a neural network with training data recited at a high level of generality with supervised learning and pseudo labels such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
Accordingly, the additional elements do not integrate the abstract ideas into a practical application because they do not impose meaningful limits on practicing the abstract ideas. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions. The additional element of: wherein training the semantic parser neural network model includes supervised learning using the pseudo-labels (training a neural network with training data recited at a high level of generality with supervised learning and pseudo labels such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)) does not amount significantly more than the abstract idea. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 14 recites similar limitations and as such a similar rejection applies.
Claim 4 incorporates the rejection of claim 1.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitation:
wherein the one or more rules derive one or more pseudo-labels from an action template, associated with the intervention action, that includes precondition propositions and effect propositions for the intervention action (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper).
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions.
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 15 recites similar limitations and as such a similar rejection applies.
Claim 5 incorporates the rejection of claim 4.
Step 2A Prong 1: The judicial exceptions of claim 4 are incorporated. The claim recites the following limitation:
wherein the action template includes parameters that an action accepts, preconditions for success of the action, and effects that occur upon success of the action (this is merely additional information for a previously mentioned abstract idea of deriving pseudo-labels from rules and the limitation as a whole is still considered mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper).
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions.
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 16 recites similar limitations and as such a similar rejection applies.
Claim 6 incorporates the rejection of claim 2.
Step 2A Prong 1: The judicial exceptions of claim 2 are incorporated. The claim recites the following limitation:
wherein the one or more rules include a rule selected from the group consisting of a first rule relating to preconditions of an action template for a successful action, a second rule relating to effects of the action template for the successful action, and a third rule relating to preconditions of the action template for the successful action that are not canceled in the effects of the action template, (this is merely additional information for a previously mentioned abstract idea of deriving pseudo-labels from rules and the limitation as a whole is still considered mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper).
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions.
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 17 recites similar limitations and as such a similar rejection applies.
Claim 7 incorporates the rejection of claim 2.
Step 2A Prong 1: The judicial exceptions of claim 2 are incorporated. The claim recites the following limitation:
wherein the one or more rules further include a rule that sets a noisy pseudo-label responsive to a determination that the intervention action is unsuccessful (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper).
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions.
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 8 incorporates the rejection of claim 1.
Step 2A Prong 1: The judicial exceptions of claim 1 are incorporated. The claim recites the following limitation:
wherein evaluating the recorded states includes determining a pseudo-reward for the intervention action, based on the recorded states and a goal state (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper).
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions.
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 18 recites similar limitations and as such a similar rejection applies.
Claim 9 incorporates the rejection of claim 8.
Step 2A Prong 1: The judicial exceptions of claim 8 are incorporated. The claim recites an abstract idea.
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites additional elements:
wherein training the semantic parser neural network model includes reinforcement learning using the pseudo-reward (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
Accordingly, the additional elements do not integrate the abstract ideas into a practical application because they do not impose meaningful limits on practicing the abstract ideas. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions. The additional element of: wherein training the semantic parser neural network model includes reinforcement learning using the pseudo-reward (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)) does not amount significantly more than the abstract idea. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 19 recites similar limitations and as such a similar rejection applies.
Claim 10 incorporates the rejection of claim 8.
Step 2A Prong 1: The judicial exceptions of claim 8 are incorporated. The claim recites the following limitation:
wherein the pseudo-reward for the intervention action is determined based on a goal within the environment (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper).
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites no additional elements. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions. The claim is not patent eligible.
Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 20 recites similar limitations and as such a similar rejection applies.
Claim 11
Step 2A Prong 1: The claim recites the following limitations:
perform an… intervention action in a text-based environment (performing actions in a text based environment as stated in paragraph 59 of the specification is being considered a mental process of evaluation which can reasonably be performed in the human mind and/or with aid of pen and paper.);
perform an inverse action in the text-based environment to reverse the intervention action; (performing inverse actions in a text based environment as stated in paragraph 46-48 of the specification is being considered a mental process of evaluation which can be reasonably performed in human mind or with the aid of pen and paper);
evaluate the recorded states to generate training data (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper);
including a determination of pseudo- labels for logical propositions based on one or more rules that include a rule where, if a proposition is in effects of the intervention action but was not in preconditions of the intervention action, then the proposition was false in an initial state before the intervention action (mental process of evaluation which can be reasonably performed in the human mind or with aid of pen and paper);
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites additional elements:
A non-transitory computer readable storage medium comprising a computer readable program for training a semantic parser, wherein the computer readable program when executed on a computer causes the computer to… (computer components recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
perform an automated intervention action (performing an action automatically amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
record states of the text-based environment before and after the intervention action and the inverse action (this limitation merely constitutes data gathering which is considered insignificant extra-solution activity under MPEP 2106.05(g));
and train a semantic parser neural network model using the training data (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
Accordingly, the additional elements do not integrate the abstract ideas into a practical application because they do not impose meaningful limits on practicing the abstract ideas. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions. The additional element of:
A non-transitory computer readable storage medium comprising a computer readable program for training a semantic parser, wherein the computer readable program when executed on a computer causes the computer to… (computer components recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f))
perform an automated intervention action (performing an action automatically amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
record states of the text-based environment before and after the intervention action and the inverse action (this limitation merely constitutes data gathering which is considered insignificant extra-solution activity under MPEP 2106.05(g), furthermore the courts have recognized Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; as a well-understood, routine, and conventional activity.);
and train a semantic parser neural network model using the training data (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f))
does not amount significantly more than the abstract idea. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim 12
Step 2A Prong 1: The claim recites the following limitations:
performs an… intervention action in a text-based environment, that performs an inverse action in the text-based environment to reverse the intervention action, (performing actions in a text based environment as stated in paragraph 59 of the specification and an inverse action stated in paragraph 46-48 is being considered a mental process of evaluation which can reasonably be performed in the human mind and/or with aid of pen and paper.);
evaluates the recorded states to generate training data (mental process of evaluation which can reasonably be performed in the human mind or with aid of pen and paper);
including a determination of pseudo-labels for logical propositions that encode and outcome of one or more rules including a rule where, if a proposition is listed as an effect of the intervention action but is not listed as a precondition of the intervention action, then the pseudo-label indicates that the proposition was false in an initial state before the intervention action; (mental process of determination which can be reasonably performed in human mind or with the aid of pen and paper);
Step 2A Prong 2: The judicial exceptions are not integrated into practical application. The claim recites additional elements:
A system for training a semantic parser, comprising: a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements… (computer components recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
an exploration agent that (agents recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f))
a state evaluator that (agents recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f))
performing an automated intervention action (performing an action automatically amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
and that records states of the text-based environment before and after the intervention action and the inverse action (this limitation merely constitutes data gathering which is considered insignificant extra-solution activity under MPEP 2106.05(g));
and a model trainer that trains a semantic parser neural network model using the training data (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
Accordingly, the additional elements do not integrate the abstract ideas into a practical application because they do not impose meaningful limits on practicing the abstract ideas. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exceptions. The additional element of:
A system for training a semantic parser, comprising: a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements… (computer components recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
an exploration agent that (agents recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f))
a state evaluator that (agents recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f))
performing an automated intervention action (performing an action automatically amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
and that records states of the text-based environment before and after the intervention action and the inverse action (this limitation merely constitutes data gathering which is considered insignificant extra-solution activity under MPEP 2106.05(g), furthermore the courts have recognized Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs., 788 F.3d at 1363, 115 USPQ2d at 1092-93; as a well-understood, routine, and conventional activity.);
and a model trainer that trains a semantic parser neural network model using the training data (training a neural network with training data recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic computer components, see MPEP 2106.05(f)).
does not amount significantly more than the abstract idea. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 and 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan et al. (Yuan, X., Côté, M. A., Sordoni, A., Laroche, R., Combes, R. T. D., Hausknecht, M., & Trischler, A. (2018). Counting to explore and generalize in text-based games. arXiv preprint arXiv:1806.11525.), hereafter referred to as Yuan, in view of Tien et al. (Ammanabrolu, P., Tien, E., Hausknecht, M., & Riedl, M. O. (2020). How to avoid being eaten by a grue: Structured exploration strategies for textual worlds. arXiv preprint arXiv:2006.07409.), hereafter referred to as Tien, and in further view of Lang et al. (US20200410399, “Method and system for determining policies, rules, and agent characteristics, for automating agents, and protection”), hereafter referred to as Lang and Helmert et al. (Helmert, M. (2009). Concise finite-domain representations for PDDL planning tasks. Artificial Intelligence, 173(5-6), 503-535.), hereafter referred to as Helmert,
Claim 1, Yuan teaches the following limitations:
performing an automated intervention action in a text-based environment (Yuan, page 1, col. 2, last paragraph, “Actions (A): At each turn t, the agent issues a text command ct. The interpreter can accept any sequence of characters but will only recognize a tiny subset thereof. Furthermore, only a fraction of recognized commands will actually change the state of the world.”, this shows the agent performing automated actions (commands) in a text-based game.);
performing an inverse action in the text-based environment to reverse the intervention action (Yuan, page 3, col. 2, paragraph 1, “Conversely, to solve medium and hard games, the agent must reverse its previous action when it enters distractor rooms to return to the chain, and also recall farther into the past to track which exits it has already passed through.”, Yuan explicitly states the agent must “reverse its previous action” to undo the effect of entering a distractor, analogous to the idea of performing an inverse action to revert the environment to a prior state.);
Tien, in the same field of semantic parsing and reinforcement learning of an environment, teaches the following limitations which the above prior art fails to teach:
PNG
media_image1.png
443
329
media_image1.png
Greyscale
Algorithm 1 of Tien
recording states of the text-based environment before and after the intervention action and the inverse action (Tien, page 6, section 4.2, paragraph 2, “Specifically, Algorithm 1 optimizes the policy π as usual, but also keeps track of a buffer S of the distinct states and knowledge graphs that led up to each state (we use state st to colloquially refer to the combination of an observation ot and knowledge graph KGt)… The agent then backtracks by searching backwards through the state sequence Sb, restarting from each of the previous states—and training for N steps in search of a more optimal policy to overcome the bottleneck”, Tien’s intervention action corresponds to the normal forward environment interaction step (ENV.STEP(st, π)), which transitions the environment from pre-state st to post-state st+1. Tien records this transition by appending the resulting state st+1 into the state buffer S, thereby retaining states before/after the forward action along the trajectory.
Tien’s inverse action corresponds to its backtracking routine (BACKTRACK(πb, Sb)), which returns the agent/environment to previously visited states by searching backward through the stored state sequence Sb and restarting from earlier states (e.g., assigning s0 ← b for prior b ∈ Sb). Because Tien both (i) stores the visited state sequence in buffers (S, Sb) and (ii) performs the backtrack return to earlier stored states, Tien teaches recording states before and after the forward traversal (intervention) and the backtrack return (inverse).);
It would have been obvious to a person of ordinary skill in the art (POSITA) before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan with the teachings disclosed by Tien (i.e., recording action states and generating training data). A motivation for the combination is to produce a knowledge graph that accurately captures the semantic understanding of the environment, (Tien, page 5, last paragraph, “To alleviate these issues, we define an intrinsic motivation for the agent that leverages the knowledge graph being built during exploration. The motivation is for the agent to learn more information regarding the world and expand the size of its knowledge graph. This provides us with a better indication of whether an agent is stuck or not—a stuck agent does not visit any new states, learns no new information about the world, and therefore does not expand its knowledge graph—leading to more effective bottleneck detection overall.”).
Lang, in the same field of semantic parsing, teaches the following limitation which Yuan and Tien fail to teach:
and training a semantic parser neural network model using the training data (Lang, paragraph 95, “The present invention comprises an action ( e.g. attack ) execution and sequencing entity that can parse an attack model ( e.g. by a processor , incl . loading from memory or storage ) and executes ( e.g. by a processor ) each step , and then select a sub - branch ( i.e. child node ) based on numerous factors…”, the models trained in lang are used for semantic parsing, specifically for parsing actions in an attack model.
Paragraph 161, “In step 120, the policy determining entity correlates, by a processor, the imported information with the enforced policy and/or rules/configurations. The goal is to produce labeled training data (a term well-understood by those skilled in the art of machine learning) e.g. used for training a neural net. For example, if a particular network traffic packet triggers an access rule, a record is created that includes at least a pair consisting of the traffic packet (the source for input features into machine learning) and the rule that triggered (the label that should be predicted).”, imported information (semantic action taken in an environment) is compared with a set of enforced rules in order to determine a label used for generating training data.
Paragraph 185, “As shown in FIG. 2, a policy determining entity (200) may for example use the generated records as labeled training data 205 for machine learning 210, a technique known to those skilled in the art. Machine learning often involves training a neural net (e.g. a deep neural net) 212 with parts of the labeled data 205, and then testing the network using the remaining parts of the labeled data 210, and iterating/improving (e.g. using random forest techniques) until the neural net's predictions are sufficiently accurate”, The newly acquired labeled data is used to train the neural network as shown on figure 2, which depicts the training/testing process for a neural network model using the labeled/unlabeled data.)
evaluating the recorded states to generate training data including a determination of pseudo-labels for logical propositions that encode an outcome of one or more rules including a rule.. wherein the pseudo-label indicates… (Lang, paragraph 161, “In step 120, the policy determining entity correlates, by a processor, the imported information with the enforced policy and/or rules/configurations. The goal is to produce labeled training data (a term well-understood by those skilled in the art of machine learning) e.g. used for training a neural net. For example, if a particular network traffic packet triggers an access rule, a record is created that includes at least a pair consisting of the traffic packet (the source for input features into machine learning) and the rule that triggered (the label that should be predicted).”,
Lang, paragraph 185 “As shown in FIG. 2, a policy determining entity (200) may for example use the generated records as labeled training data 205 for machine learning”, Lang’s “rule that triggered (the label that should be predicted)” is a rule-outcome label that can be represented as a logical proposition about the applied rule(s). Thus, the label encodes the outcome of one or more rules including a rule (triggered vs. not triggered, or which enforced rule triggered).
Further, the label is properly characterized as a pseudo-label because it is generated automatically by correlating imported/observed information against enforced rules (i.e., rule application produces the label), rather than being manually annotated. Lang then uses those automatically generated rule-outcome labels as training data for training a neural network model, matching the claimed generation of training data including pseudo-labels encoding rule outcomes.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan/Tien with the teachings disclosed by Lang (i.e., training a neural network with training data generated from semantic actions). A motivation for the combination is to produce model-driven approaches which allows defining information precisely and semantically rich, (Lang, paragraph 371, “Model-driven approaches allow defining information precisely and semantically rich.”).
Helmert, in the same field of logical proposition reasoning, teaches the following limitation which the above prior art fails to teach:
where, if a proposition is listed as an effect of the intervention action but is not listed as a precondition of the intervention action, then… the proposition was false in an initial state before the intervention action; (Helmert, page 512, section 4.3, paragraph 5, “In the following, we will refer to the individual simple effects of an operator in a normalized PDDL task as being arranged in an effect list. For the simple effect e occurring within the universal conditional effect ∀v1 ... vk : ϕ > e, we will refer to {v1,..., vk} as the set of bound variables of e and to ϕ as the condition of e. If e is a positive literal, we will call it an add effect, otherwise a delete effect.”;
Helmert, page 515, paragraph 6, “For the imbalance test, an add effect leads to an imbalance by default. However, it can be balanced if whenever the operator is actually applied in a state s (which requires that o’.precond is true in state s) and the add effect triggers (e.cond is true in s) and actually adds something (e.atom is false in s), then something is deleted at the same time, which means that the delete effect triggers (e’.cond is true in s) and deletes something that was previously true (e’.atom is true in s).”, Helmert describes standard PDDL/STRIPS-style operator semantics distinguishing preconditions from effects (add/delete). Helmert states that when an operator is applied (i.e., its precondition holds), an add effect “actually adds something” only when the affected atom is false in the pre-action state s. Thus, for a proposition that is (i) listed as an add effect and (ii) not required by the operator’s preconditions, Helmert’s semantics supports the rule-outcome that the proposition was false before the action in the cases where it is being added (i.e., made newly true).)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Helmert’s formal operator-semantics into Yuan and Tien’s text-environment interaction and state-recording framework, and further into Lang’s rule-driven labeled-data generation pipeline, because Yuan/Tien provide executed actions and recorded state transitions in a text environment, while Helmert provides an established semantics for inferring truth values across those state transitions, and Lang explicitly teaches generating labeled training data by applying rules and recording the rule outcome as the label used for training a neural network. For example, Helmert explains that when an operator is applied in a state s (precondition true in s), “add effects actually add propositions that were not true previously (… are false in s), (Helmert, page 515, paragraph 5)” enabling rule-based inference of pre-state truth values from action semantics. Lang teaches that “the goal is to produce labeled training data… used for training a neural net,” (Lang, paragraph 164) and that when an access rule triggers, a record is created including “the rule that triggered (the label that should be predicted),” (Lang, paragraph 164) and that the generated records are used as labeled training data for machine learning. A motivation for the combination is to improve the quality and scalability of automatically generated supervision by applying well-understood planning/operator semantics (Helmert) to recorded action/state traces (Yuan/Tien) and encoding the resulting rule outcomes as labels for neural training (Lang).
Regarding independent claim 12, Yuan teaches a system for training a semantic parser, comprising:
an exploration agent that performs an automated intervention action in a text-based environment (Yuan, page 1, col. 2, last paragraph, “Actions (A): At each turn t, the agent issues a text command ct. The interpreter can accept any sequence of characters but will only recognize a tiny subset thereof. Furthermore, only a fraction of recognized commands will actually change the state of the world.”, this shows the agent performing automated actions (commands) in a text-based game.),
that performs an inverse action in the text-based environment to reverse the intervention action (Yuan, page 3, col. 2, paragraph 1, “Conversely, to solve medium and hard games, the agent must reverse its previous action when it enters distractor rooms to return to the chain, and also recall farther into the past to track which exits it has already passed through.”, Yuan explicitly states the agent must “reverse its previous action” to undo the effect of entering a distractor, analogous to the idea of performing an inverse action to revert the environment to a prior state.),
Tien, in the same field of semantic parsing and reinforcement learning of an environment, teaches the following limitations which the above prior art fails to teach:
PNG
media_image1.png
443
329
media_image1.png
Greyscale
Algorithm 2 of Tien
and that records states of the text-based environment before and after the intervention action and the inverse action (Tien, page 6, section 4.2, paragraph 2, “Specifically, Algorithm 1 optimizes the policy π as usual, but also keeps track of a buffer S of the distinct states and knowledge graphs that led up to each state (we use state st to colloquially refer to the combination of an observation ot and knowledge graph KGt)… The agent then backtracks by searching backwards through the state sequence Sb, restarting from each of the previous states—and training for N steps in search of a more optimal policy to overcome the bottleneck”, Tien’s intervention action corresponds to the normal forward environment interaction step (ENV.STEP(st, π)), which transitions the environment from pre-state st to post-state st+1. Tien records this transition by appending the resulting state st+1 into the state buffer S, thereby retaining states before/after the forward action along the trajectory.
Tien’s inverse action corresponds to its backtracking routine (BACKTRACK(πb, Sb)), which returns the agent/environment to previously visited states by searching backward through the stored state sequence Sb and restarting from earlier states (e.g., assigning s0 ← b for prior b ∈ Sb). Because Tien both (i) stores the visited state sequence in buffers (S, Sb) and (ii) performs the backtrack return to earlier stored states, Tien teaches recording states before and after the forward traversal (intervention) and the backtrack return (inverse).);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan with teachings disclosed by Tien. A motivation for the combination is to create a more semantically rich knowledge graph as previously disclosed in claim 1.
Lang teaches the following limitation which Yuan and Tien fails to teach:
a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements (Lang, paragraph 57, “The agent entity can therefore for example execute (e.g. on a processor) any combination of automated steps (e.g. based on particular inputs or already-discovered information, e.g. read by a processor from a memory or storage)”):
and a model trainer that trains a semantic parser neural network model using the training data (Lang, paragraph 164, “In step 120, the policy determining entity correlates, by a processor, the imported information with the enforced policy and/or rules/configurations. The goal is to produce labeled training data (a term well-understood by those skilled in the art of machine learning) e.g. used for training a neural net. For example, if a particular network traffic packet triggers an access rule, a record is created that includes at least a pair consisting of the traffic packet (the source for input features into machine learning) and the rule that triggered (the label that should be predicted).”, imported information (semantic action taken in an environment) is compared with a set of enforced rules in order to determine a label used for generating training data.)
the recorded states to generate training data including a determination of pseudo-labels for logical propositions that encode an outcome of one or more rules including a rule.. wherein the pseudo-label indicates… (Lang, paragraph 161, “In step 120, the policy determining entity correlates, by a processor, the imported information with the enforced policy and/or rules/configurations. The goal is to produce labeled training data (a term well-understood by those skilled in the art of machine learning) e.g. used for training a neural net. For example, if a particular network traffic packet triggers an access rule, a record is created that includes at least a pair consisting of the traffic packet (the source for input features into machine learning) and the rule that triggered (the label that should be predicted).”,
Lang, paragraph 185 “As shown in FIG. 2, a policy determining entity (200) may for example use the generated records as labeled training data 205 for machine learning”, Lang’s “rule that triggered (the label that should be predicted)” is a rule-outcome label that can be represented as a logical proposition about the applied rule(s). Thus, the label encodes the outcome of one or more rules including a rule (triggered vs. not triggered, or which enforced rule triggered).
Further, the label is properly characterized as a pseudo-label because it is generated automatically by correlating imported/observed information against enforced rules (i.e., rule application produces the label), rather than being manually annotated. Lang then uses those automatically generated rule-outcome labels as training data for training a neural network model, matching the claimed generation of training data including pseudo-labels encoding rule outcomes.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan with teachings disclosed by Lang. A motivation for the combination is to produce model-driven approaches which allows defining information precisely and semantically rich as previously disclosed in claim 1.
Helmert, in the same field of logical proposition reasoning, teaches the following limitation which the above prior art fails to teach:
where, if a proposition is listed as an effect of the intervention action but is not listed as a precondition of the intervention action, then… the proposition was false in an initial state before the intervention action; (Helmert, page 512, section 4.3, paragraph 5, “In the following, we will refer to the individual simple effects of an operator in a normalized PDDL task as being arranged in an effect list. For the simple effect e occurring within the universal conditional effect ∀v1 ... vk : ϕ > e, we will refer to {v1,..., vk} as the set of bound variables of e and to ϕ as the condition of e. If e is a positive literal, we will call it an add effect, otherwise a delete effect.”;
Helmert, page 515, paragraph 6, “For the imbalance test, an add effect leads to an imbalance by default. However, it can be balanced if whenever the operator is actually applied in a state s (which requires that o’.precond is true in state s) and the add effect triggers (e.cond is true in s) and actually adds something (e.atom is false in s), then something is deleted at the same time, which means that the delete effect triggers (e’.cond is true in s) and deletes something that was previously true (e’.atom is true in s).”, Helmert describes standard PDDL/STRIPS-style operator semantics distinguishing preconditions from effects (add/delete). Helmert states that when an operator is applied (i.e., its precondition holds), an add effect “actually adds something” only when the affected atom is false in the pre-action state s. Thus, for a proposition that is (i) listed as an add effect and (ii) not required by the operator’s preconditions, Helmert’s semantics supports the rule-outcome that the proposition was false before the action in the cases where it is being added (i.e., made newly true).)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan, Tien, and Lang with teachings disclosed by Helmert. A motivation for which is as previously disclosed in claim 1.
Regarding claims 11, claim 11 is directed to a non-transitory computer readable storage medium comprising a computer readable program for training a semantic parser, wherein the computer readable program when executed on a computer causes the computer to perform the method recited in claim 1. Therefore, the rejection made to claim 1 is applied to claim 11. In addition, Lang, paragraph 57, “The agent entity can therefore for example execute (e.g. on a processor) any combination of automated steps (e.g. based on particular inputs or already-discovered information, e.g. read by a processor from a memory or storage)”, shows that Lang anticipates the hardware limitation of claim 11.
Claims 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan, Tien, Lang, and Helmert as applied to claims 1 and 12, and further in view of Yao et al. (Yao, Y., Xu, K., Murasaki, K. et al. Pseudo-labelling-aided semantic segmentation on sparsely annotated 3D point clouds. IPSJ T Comput Vis Appl 12, 2 (2020). doi.org/10.1186/s41074-020-00064-w), hereinafter referred to as Yao.
Regarding claim 3, Yuan, Tien, Lang, and Helmert teaches the limitations of claim 1. Yao in the same field of semantic parsing teaches the following limitations which the above prior art fails to teach:
wherein training the semantic parser neural network model includes supervised learning using the pseudo-labels (Yao, page 3, col. 1, section 4, paragraph 1, “Our method is based on pseudo-labelling, a semi-supervised learning technique described by Lee [19]. It operates by alternating between training of a classification network and label propagation (Fig. 1). The classification network is used to predict point labels based on its local neighbourhood of points. It is trained in a supervised fashion using originally labelled and pseudo-labelled points.”, the neural network training in Yao comprises the supervised learning of a pseudo-labeled data set.
Page 2, col. 1, paragraph 1, “We integrate pseudo-labelling with a state of the art architecture for deep learning on point clouds to semantically label a sparsely annotated scene.”, The trained network is used to semantically label data making it a semantic parser neural network.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan, Tien, Lang, and Helmert with the teachings disclosed by Yao (i.e., using pseudo-labeling to train a semantic parser). A motivation for the combination is to produce additional high-quality training data than what normal methods produce, (Yao, section 5.5.2, paragraph 1, “Our method labels the scene gradually by accepting confident predictions every iteration. In this section, we discuss the intermediate stages of the process for the case when kdist is hard. Figure 3 visualizes label assignments at three points in the process alongside error cases for each. Intermediate F-scores shown below the images are calculated by evaluating on the accepted pseudo-labels at each stage. Figure 4 plots intermediate F-scores against the percentage of points labelled. From these figures, we make two important observations. First, pseudo-labelled points selected early on are highly accurate. Thus, they provide the model with additional high quality training data. This is why our method was able to achieve improvements over the supervised baseline.”).
Claim 14 is directed to a system for training a semantic parser, comprising: a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements the method of claim 3. Therefore, the rejection made to claim 3 is applied to claim 14.
Claims 4, 6, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan, Tien, Lang, and Helmert as applied to claims 1 and 12, and further in view of Schwendiman et al. (US8276192, “System and method for security planning with hard security constraints”), hereinafter referred to as Schwendimann.
Regarding claim 4, Yuan, Tien, Lang, and Helmert teaches the limitations of claim 1. Schwendimann, in the same field of generating semantic training data for a neural network, further teaches the following limitations which Yuan, Tien, Lang, and Helmert fail to teach:
wherein the one or more rules derive one or more pseudo-labels from an action template, associated with the intervention action, that includes precondition propositions and effect propositions for the intervention action (Schwendimann, col. 11, lines 35-42, “For each action corresponding to a component, labels corresponding to input ports {Cj:1≦j≦J} and labels corresponding to output ports {Lk,Uk:1≦k≦K} are reflected in an action description as follows: a) For every input port j, 1≦j≦J include the following predicates in the corresponding precondition of the action: b) For every output port k, 1≦k≦K include the following predicates in the corresponding effect of the action”, labels corresponding to preconditions and effects of an action are based on rules (1 <= j <= J for preconditions or 1<=k<=K for effects)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan, Tien, Lang, and Helmert with teachings disclosed by Scwendimann (i.e., deriving pseudo labels from a set of rules). A motivation for the combination is to take advantage of component reuse. (Schwendimann, paragraph 7, “The main advantage of a component environment is component reuse. For example, different networks of components can be composed as needed to generate a final product according to current business requirements. The same component can participate in multiple networks.”).
Regarding claim 15, claim 15 is directed to a system for training a semantic parser, comprising: a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements the method of claim 4. Therefore, the rejection made to claims 4 is applied to claim 15. In addition, Lang discloses a hardware processor and memory that stores computer program code on paragraph 57 as previously disclosed for claim 11.
Regarding claim 6, Yuan, Tien, Lang, and Helmert teaches the limitations of claim 1. Schwendimann further teaches the following limitations which Yuan, Tien, Lang, and Helmert fails to teach:
wherein the one or more rules further include a rule selected from the group consisting of a first rule relating to preconditions of an action template for a successful action, a second rule relating to effects of the action template for the successful action, and a third rule relating to preconditions of the action template for the successful action that are not canceled in the effects of the action template, (Schwendimann, col. 11, lines 35-42, “For each action corresponding to a component, labels corresponding to input ports {Cj:1≦j≦J} and labels corresponding to output ports {Lk,Uk:1≦k≦K} are reflected in an action description as follows: a) For every input port j, 1≦j≦J include the following predicates in the corresponding precondition of the action: b) For every output port k, 1≦k≦K include the following predicates in the corresponding effect of the action”, labels corresponding to preconditions and effects of an action are based on rules (1 <= j <= J for preconditions or 1<=k<=K for effects of an action). These two rules correspond to the first two rules stated in the Markush grouping for claim 6.).
The rationale to combine Yuan, Tien, Lang, and Helmert with Schwendimann is the same as set forth above in claim 4.
Regarding claim 17, claim 17 is directed a system for training a semantic parser, comprising: a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements the method of claim 6. Therefore, the rejection made to claim 6 is applied to claim 17. In addition, Lang discloses a hardware processor and memory that stores computer program code on paragraph 57 as previously disclosed for claim 11.
Claims 5, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan, Tien, Lang, and Helmert and Schwendimann as applied to claims 4 and 15, and further in view of Fischbach et al. (M. Fischbach, D. Wiebusch and M. E. Latoschik, "Semantic Entity-Component State Management Techniques to Enhance Software Quality for Multimodal VR-Systems," in IEEE Transactions on Visualization and Computer Graphics, vol. 23, no. 4, pp. 1342-1351, April 2017, doi: 10.1109/TVCG.2017.2657098), hereinafter referred to as Fischbach.
Regarding claim 5, Yuan, Tien, Lang, Helmert and Schwendimann teaches the limitations of claim 4. Fischbach, in the same field of describing semantic actions, teaches the limitations which Yuan, Tien, Lang, Helmert and Schwendimann do not teach.
wherein the action template includes parameters that an action accepts, preconditions for success of the action, and effects that occur upon success of the action (Fischbach, page 1410, col. 2, figure 5, the action description in figure 5 teaches the action template as disclosed in the claim, comprising parameters, preconditions, and effects of the action.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan, Tien, Lang, Helmert and Schwendimann with teachings disclosed by Fischbach (i.e., configuring an action template with preconditions and effects of an action). A motivation for the combination is to easily transform action descriptions into definition languages like PDDL (Fischbach, page 1411, section 4.3, paragraph 3, “Since action descriptions can easily be transformed into fragments of common definition languages, like PDDL [26], the utilization of planning software that is compatible with such languages is facilitated”).
Regarding claim 16, claim 16 is directed to a system for training a semantic parser, comprising: a hardware processor; and a memory that stores computer program code which, when executed by the hardware processor, implements the method of claim 5. Therefore, the rejection made to claim 5 is applied to claim 16. In addition, Lang discloses a hardware processor and memory that stores computer program code on paragraph 57 as previously disclosed for claim 11.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Yuan, Tien, Lang, and Helmert as applied to claim 1, and further in view of Xie et al. (J. Xie and X. Chen, "Understanding Instructions on Large Scale for Human-Robot Interaction," 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Warsaw, Poland, 2014, pp. 175-182, doi: 10.1109/WI-IAT.2014.165), hereinafter referred to as Xie.
Regarding claim 7, Yuan, Tien, Lang, and Helmert teaches the limitations of claim 1. Xie, in the same field of describing semantic parsing, teaches the limitations which Yuan, Tien, Lang, and Helmert do not teach.
wherein the one or more rules further include a rule that sets a noisy pseudo-label responsive to a determination that the intervention action is unsuccessful (Xie, page 177, col. 2, part B, paragraph 5, “Besides the correctly-propagated lexemes as shown as above, there were some noise lexemes in which a word/phrase was mapped to a wrong semantic type with positive weight or mapped to a correct semantic type with negative weight.”, noisy lexemes (noisy pseudo-labels) represent words that were mapped to a wrong semantic type (action is unsuccessful)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Yuan, Tien, Lang, and Helmert with Xie by augmenting the set of rule(s) used to generate training labels (Lang) to further include a rule that assigns/sets a noisy pseudo-label when an attempted intervention action is determined to be unsuccessful, because Xie teaches that semantic parsing systems encounter “noise lexemes” where words/phrases are “propagated to wrong semantic types” and that such noise is associated with “error parse” outcomes (e.g., “testing examples with error parse… were on… noise lexemes”); moreover, Xie explains that noise lexemes “contributed to the proportion of sentences with parse but did not to the proportion of sentences with correct parse” and that “noise lexemes… improve the coverage… but compromise the precision,” providing a clear motivation for a POSITA to implement an explicit additional rule within the rule set (Lang) to label unsuccessful outcomes as noisy rather than clean, thereby maintaining/improving parser coverage while accounting for known noise behavior described by Xie.
Claims 8-10 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yuan, Tien, Lang, and Helmert as applied to claims 1 and 12, and further in view of Van Saijen et al. (US10977551, “Hybrid reward architecture for reinforcement learning”), hereinafter referred to as Van Saijen.
Regarding claim 8, Yuan, Tien, Lang, and Helmert teaches the limitations of claim 1. Van Saijen, in the same field of reinforcement learning, teaches the limitations which Yuan, Tien, Lang, and Helmert does not teach:
wherein evaluating the recorded states includes determining a pseudo-reward for the intervention action, based on the recorded states and a goal state (Van Saijen, Claim 1, “A method comprising: obtaining a reward function associated with an environment, wherein the environment describes multiple state variables; splitting the reward function into a plurality of sub-reward functions according to a number of state variables that affect each sub-reward function; training a plurality of reinforcement-learning agents using the plurality of sub-reward functions; and applying an aggregator function that receives action-values from each of the plurality of reinforcement-learning agents and combines the received action-values into a set of action-values”, a reinforcement learning is trained by evaluating actions in an environment with multiple state variables. A pseudo-reward function is determined as described in col. 36, lines 1-15, “In some approaches, HRA builds on the Horde architecture. The Horde architecture includes a large number of “demons” that learn in parallel via off-policy learning. Each demon trains a separate general value function (GVF) based on its own policy and pseudo-reward function. A pseudo-reward can be any feature-based signal that encodes useful information. The Horde architecture can focus on building general knowledge about a world encoded via a large number of GVFs. In some examples, HRA focuses on training separate components of the environment-reward function to achieve a smoother value function to efficiently learn a control policy. In some examples, HRA can apply multi-objective learning to smooth a value function of a single reward function”. A goal state is used to determine the reward function as shown in col. 5, lines 28-31, “For fully competitive tasks, which are typically a two-agent case, the agents have opposing goals (e.g., the reward function of one agent is the negative of the reward function of the other).”, which implies that the reward function is based on a goal).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan, Tien, Lang, and Helmert with teachings disclosed by Van Saijen (i.e., evaluating states from an intervention action to determine pseudo rewards). A motivation for the combination is to allow specialized agents for different parts of a task (Van Saijen, col. 1, lines 58-61, “This approach has at least the following advantages: 1) it allows for specialized agents for different parts of the task, and 2) it provides a new way to transfer knowledge, by transferring trained agents.”).
Regarding claim 9, Yuan, Tien, Lang, Helmert , and Van Saijen teaches the limitations of claim 8. Van Saijen further teaches the limitation:
wherein training the semantic parser neural network model includes reinforcement learning using the pseudo-reward (Van Saijen, claim 1, “training a plurality of reinforcement-learning agents using the plurality of sub-reward functions”, The sub reward function used to train the reinforcement learning model is a pseudo reward function as shown in claim 7.).
The rationale to combine Yuan, Tien, Lang, Helmert with Van Saijen is the same as set forth above in claim 8.
Regarding claim 10, Yuan, Tien, Lang, Helmert and Van Saijen teaches the limitations of claim 8. Van Saijen further teaches the limitation:
wherein the pseudo-reward for the intervention action is determined based on a goal within the environment. (Van Saijen, col. 5, lines 28-31, “For fully competitive tasks, which are typically a two-agent case, the agents have opposing goals (e.g., the reward function of one agent is the negative of the reward function of the other).” The reward functions for agents with opposite goals will be negative to each other, implying that the reward function is based on a goal.).
The rationale to combine Yuan, Tien, Lang, and Helmert with Van Saijen is the same as set forth above in claim 8.
Regarding claim 18, Yuan, Tien, Lang, and Helmert teaches the limitations of claim 12. Van Saijen, in the same field of reinforcement learning, teaches the limitations which Yuan, Tien, Lang, and Helmert does not teach:
wherein the state evaluator determines a pseudo-reward for the intervention action, based on the recorded states and a goal state (Van Saijen, Claim 1, “A method comprising: obtaining a reward function associated with an environment, wherein the environment describes multiple state variables; splitting the reward function into a plurality of sub-reward functions according to a number of state variables that affect each sub-reward function; training a plurality of reinforcement-learning agents using the plurality of sub-reward functions; and applying an aggregator function that receives action-values from each of the plurality of reinforcement-learning agents and combines the received action-values into a set of action-values”, a reinforcement learning is trained by evaluating actions in an environment with multiple state variables. A pseudo-reward function is determined as described in col. 36, lines 1-15, “In some approaches, HRA builds on the Horde architecture. The Horde architecture includes a large number of “demons” that learn in parallel via off-policy learning. Each demon trains a separate general value function (GVF) based on its own policy and pseudo-reward function. A pseudo-reward can be any feature-based signal that encodes useful information. The Horde architecture can focus on building general knowledge about a world encoded via a large number of GVFs. In some examples, HRA focuses on training separate components of the environment-reward function to achieve a smoother value function to efficiently learn a control policy. In some examples, HRA can apply multi-objective learning to smooth a value function of a single reward function”. A goal state is used to determine the reward function as shown in col. 5, lines 28-31, “For fully competitive tasks, which are typically a two-agent case, the agents have opposing goals (e.g., the reward function of one agent is the negative of the reward function of the other).”, which implies that the reward function is based on a goal. Head component 2426 in is an example ‘state evaluator’ that determines a pseudo-reward for an agent interacting with an environment, col. 37, lines 22-26, “The HRA neural network 2420 includes an input layer 2422, one or more hidden layers 2424, and a plurality of heads 2426, each with their own reward function (as illustrated R1, R2, and R3). The heads 2426 inform the output 2428 (e.g., using a linear combination).” And col. 36, lines 25-28, “The heads of HRA can represent values, trained with components of the environment reward. Even after training, these values can stay relevant because the aggregator uses the values of all heads to select its action.”).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Yuan, Tien, Lang, and Helmert with teachings disclosed by Van Saijen (i.e., evaluating states from an intervention action to determine pseudo rewards). A motivation for the combination is to create a more efficient semantic parser as previously disclosed in claim 8.
Regarding claim 19, Yuan, Tien, Lang, Helmert and Van Saijen teaches the limitations of claim 18. Van Saijen further teaches the limitations:
wherein the model trainer performs reinforcement learning using the pseudo-reward (Van Saijen, claim 1, “training a plurality of reinforcement-learning agents using the plurality of sub-reward functions”, The sub reward function used to train the reinforcement learning model is a pseudo reward function as shown in claim 7. The trainer agents described in paragraph 8 is interpreted as the model trainer that performs reinforcement learning).
The rationale to combine Yuan, Tien, Lang, and Helmert with Van Saijen is the same as set forth above in claim 8.
Regarding claim 20, Yuan, Tien, Lang, Helmert and Van Saijen teaches the limitations of claim 18. Van Saijen further teaches the limitations:
wherein the state evaluator determines the pseudo-reward for the intervention action based on a goal within the environment (Van Saijen, Col 5, Lines 28-31, “For fully competitive tasks, which are typically a two-agent case, the agents have opposing goals (e.g., the reward function of one agent is the negative of the reward function of the other).” The reward functions for agents with opposite goals will be negative to each other, implying that the reward function is based on a goal.).
The rationale to combine Yuan, Tien, Lang, and Helmert with Van Saijen is the same as set forth above in claim 8.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Edwards, A., Sahni, H., Liu, R., Hung, J., Jain, A., Wang, R., ... & Yosinski, J. (2020, November). Estimating q (s, s’) with deep deterministic dynamics gradients. In International Conference on Machine Learning (pp. 2825-2835). PMLR.
Gui, T., Liu, P., Zhang, Q., Zhu, L., Peng, M., Zhou, Y., & Huang, X. (2019, July). Mention recommendation in twitter with cooperative multi-agent reinforcement learning. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 535-544).
Cannon, J. (2011). Robot motion planning using real-time heuristic search. University of New Hampshire.
Kim, Y. C., & Yoon, W. C. (2010, February). Handling Manually Programmed Task Procedures in Human–Service Robot Interactions. In Human-Robot Interaction. IntechOpen.
Ryan, C. (1997). Evaluating the effectiveness of electronic special interest groups (Doctoral dissertation, Dublin City University).
Thrun, S. B. (1992). E cient exploration in reinforcement learning. Technical Report CMU-CS-92-102, School of Computer Science, Carnegie Mellon University.
Ye, D., Zhang, M., & Sutanto, D. (2012). Self-organization in an agent network: A mechanism and a potential application. Decision Support Systems, 53(3), 406-417.
Hausknecht, M., Ammanabrolu, P., Côté, M. A., & Yuan, X. (2020, April). Interactive fiction games: A colossal adventure. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 7903-7910).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/H.B.Y./Examiner, Art Unit 2124
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146