Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-23 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 29, 2023 was filed. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because it contains an embedded hyperlink and/or other form of browser-executable code. In the instant case specification, paragraphs [0030] and [0068] contain hyperlinks that are not permitted. Applicant is required to delete the embedded hyperlink and/or other form of browser-executable code; references to websites should be limited to the top-level domain name without any prefix such as http:// or other browser-executable code. See MPEP § 608.01.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-9, and 19-23 are rejected under 35 U.S.C. 101 because the claimed invention is directed to software per se and therefore does not recite patent-eligible subject matter.
Claims 1, 6, and 19 and their dependent claims are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four categories of patent eligible subject matter because the claimed device does not demonstrate any structural recitations. Although claims 1 and 6 are nominally directed to a “system”, the claims recite only software components and functional descriptions, such as a reinforcement learning algorithm, computing environment, and logical neural network, without any physical or tangible structural elements. MPEP 2106.03 states: Non-limiting examples of claims that are not directed to any of the statutory categories include: Products that do not have a physical or tangible form, such as information (often referred to as "data per se") or a computer program per se (oftenreferred to as "software per se") when claimed as a product without any structuralrecitations.
Claim 19 is directed to a “computer program product” comprising program instructions stored on computer-readable storage media, reciting software per se. The recitation of a computer-readable storage medium is insufficient because the claim is directed to the instructions stored thereon, rather than to the storage medium itself.
Applicant may overcome this rejection by amending the claims to recite a statutory category, such as a machine or manufacture, including specific physical or structural components configured to perform the claimed functions, rather than software instructions or abstract computing environments.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 10-13, and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Kimura (“Neuro-Symbolic Reinforcement Learning with First-Order Logic”, 2021) in view of Ristoski (“US 20210303762 A1”).
Regarding claim 1,
Kimura teaches a reinforcement learning (RL) computing environment comprising an RL algorithm acting as a reward-based RL agent programmed to make decisions in furtherance of a goal… (Pg. 2 Paragraph 1 of Section 3.1, “As text-based games are sequential decision-making problems, they can naturally be applied to RL. These games are partially observable… where the observation text does not include the entire information of the environment… The objective for the agent is to maximize the expected discounted reward E”);
and a logical optimal action (LOA) computing framework (neuro-symbolic framework) integrated with the RL computing environment, wherein the LOA computing framework comprises (Page 1 Under the Figure Caption of the Introduction, “In order to train logical rules, a recent neuro-symbolic framework called the Logical Neural Network (LNN) (Riegel et al., 2020) has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss.”, Page 2 Paragraph 2 of Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”
The specification of the instant case states that the logical optimal action refers to neuro-symbolic LNN that is capable of optimizing the LNN. In this case, the reference teaches using a neuro-symbolic framework using Logical Neural Network integrated with the reinforced learning environment.)
(ii) constraining rules that limit the scope of the information in the at least one dataset (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Pg. 1 Under Figure 1 Caption In Introduction Section, “…the Logical Neural Network (LNN)… has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss. Every neuron in the LNN has a component for a formula of weighted real-valued logics from a unique logical conjunction, disjunction, or negation nodes… the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts…"
The "constraining rules that limit the scope of the information in the at least one dataset" correspond to the symbolic logical rules used in the Logical Neural network (LNN). The symbolic logical rules are the parsed observation from the FOL converter that is used in the LNN. The rules are constrained to logical rules only, thus limiting the scope of the information in the dataset based on the effect of the FOL converter.),
and (2) a logical neural network (LNN) that is trained on the at least one dataset and the constraining rules and establishes LNN policy rules based on the training (Page 1 of Introduction Under Figure 1 Caption, “The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss… At the same time, the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
Here, the dataset corresponds to the observation in the Figure, the constraining rules correspond to the symbolic logical rules generated from the FOL converter, and the LNN policy rules corresponds to the output of the logical neural network which provides the rules used for the agent to make decisions. The observation and the rules are fed into the LNN to generate the policy rules for the agent to perform the actions.),
wherein the at least one dataset and the LNN policy rules are input into the RL computing environment where the RL agent makes decisions on the information in the at least one dataset that comply with the LNN policy rules and the RL agent is rewarded for decisions that advance the goal (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts… the agent understands an opened direction from the current room. The agent then retrieves the class type c of the word meaning in propositional logic l_i,t by using ConceptNet… or the network of another word’s definition.”, Page 3 Paragraph 1 of Section 3.2.2, “The LNN training component is for obtaining an action policy from the given FOL logics.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
The reference teaches that observation data from the environment, in the form of natural language text, constitutes a dataset that is processed and converted into first-order logical facts and input into a Logical Neural Network (LNN), as shown in Figure 1 and described in Section 3.2. The LNN learns logical policy rules that are input into the reinforcement learning computing environment and directly governs the RL agent’s action selection. The RL agent makes decisions that comply with the LNN policy rules and is rewarded based on the outcome of those decisions.).
Kimura does not teach a system, an interface for entry of (i) at least one dataset comprising information relating to existing polymers, and a dataset relating to the discovery of new polymer materials.
Ristoski, in the same field of endeavor, teaches A system comprising (Paragraph 0005, “A computer-implemented method and system to accelerate new polymer can be provided. The method and system can accelerate new polymer discovery.”),
an interface for entry of (i) at least one dataset comprising information relating to existing polymers (Paragraph 0091, “For instance, at least one hardware processor 1102 may receive training data set which can include candidate material for polymerization, identify one or more desired features in the candidate material, and train a machine learning model to generate a new material having one or more of the desired features.”, Paragraph 0088, “In an embodiment, e.g., optionally, the identified desired features can be presented or caused to be presented on a user interface, e.g., for user interaction or view. In an embodiment, a user (e.g., an SME) may select from the desired features, a desired feature to include in the new material being generated.”
The reference teaches an interface or GUI that allows for the user to select desired features, which is considered an entry due to the fact that the user is entering constraints into the model.)
relating to the discovery of new polymer materials (Paragraph 0080, “FIG. 8 is a diagram illustrating a method of polymer discovery in one embodiment… At 802, the method can include generating a set of material candidates expected to yield materials with target properties. Examples of material candidates include, but not limited to, molecules, monomers, and/or polymer repeat units.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s teaching of integrating a neuro-symbolic Logical Neural Network (LNN) framework with a reinforcement learning environment for decision making with Ristoski’s teaching of providing a user interface for entering polymer related constraints in order to apply the reinforcement learning and LNN to the discovery of new polymer materials, thereby improving the efficiency and guidance of polymer discovery (Paragraph 0002 of Ristoski).
Regarding claim 2,
Kimura does not teach the constraining rules limit the scope of the information in the at least one dataset to available materials and laboratory equipment.
Ritoski, in the same field of endeavor, teaches the constraining rules limit the scope of the information in the at least one dataset to available materials and laboratory equipment (Paragraph 0042, “The general model tries to learn general rules for synthesizing new materials that are shared across all labs and SMEs, e.g., each monomer suitable for polymerization must contains a polymerizable group and should not contain pendant groups that will be active under polymerization conditions.”, Paragraph 0050, “By way of example only, a training data set to train or build the classification model 106 can include a random sample from a combinatorial library of polymerizable components suitable for the preparation of polyimides, including dianhydrides, dicarboxylic acids, and diamines.”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Kimura’s teaching of integrating a neuro-symbolic Logical Neural Network (LNN) framework with a reinforcement learning environment based on game rules by incorporating Ristoski’s teaching of constraining rules derived from polymer materials and laboratory equipment in order to improve practical applicability and reliability of the learned policies (Paragraph 0002 of Ristoski).
Regarding claim 3,
Kimura teaches the LOA (neuro-symbolic framework) further comprises (3) an internal regressor (FOL converter) that parses the at least one dataset into experiments and outcomes (Page 1 Under Figure 1 Caption, “In order to train logical rules, a recent neuro-symbolic framework called the Logical Neural Network (LNN)… has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning).”, Page 1 Figure 1 Caption, “…and the first-order logical facts are extracted from an FOL converter that uses a semantic parser, ConceptNet, and history.” Page 3 Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts. The method first converts text into propositional logics li,t by a semantic parser from ot , such as, the agent understands an opened direction from the current room.”
The experiments correspond to the RL agent’s action executions in the environment based on the observation, and the outcomes correspond to the resulting state information and rewards from the logical facts generated by the FOL converter and LNN.),
wherein the LNN converts the experiments and outcomes into symbolic language understood by the RL agent (Figure 1 Caption, “The agent takes a text observation from the environment, and the first-order logical facts are extracted from an FOL converter that uses a semantic parser, ConceptNet, and history… when the agent finds a direction x and the direction x has not been visited, the agent takes a “Go x” action. Dashed lines show the initial connections before training.”, See Figure 1,
PNG
media_image2.png
420
422
media_image2.png
Greyscale
).
Regarding claim 4,
Kimura teaches the LNN policy rules are updated to reflect decisions made by the RL agent that successfully advance the goal and the updated LNN policy rules direct future decisions by the RL agent to achieve the goal (Page 2 Section 3.1, “The objective for the agent is to maximize the expected discounted reward E…”, Page 3 Under Section 3.2.2, “The LNN trains by this fact inputs and reward; that means it forwards from input facts through LNN, calculates a loss values from the reward value, and optimizes weights in LNN.”, See Algorithm 1 RL by FOL-LNN,
PNG
media_image3.png
414
392
media_image3.png
Greyscale
The LNN uses the extracted logic from the FOL, and is used to train the Logic Neural Network in order to provide an output for the agent to take an action and get an action. The reward of the action that the agent takes is used to update the rules from the LNN in order to improve and optimize the agent’s future moves.)
Regarding claim 10,
Kimura teaches [a] computer-implemented method comprising (Page 2 Paragraph 2 in Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”),
programming a reinforcement learning (RL) algorithm to act as a reward-based RL agent within an RL computing environment (See Algorithm 1 RL by FOL-LNN,
PNG
media_image4.png
419
389
media_image4.png
Greyscale
),
wherein the RL agent makes decisions in furtherance of a goal… (Pg. 2 Paragraph 1 of Section 3.1, “As text-based games are sequential decision-making problems, they can naturally be applied to RL. These games are partially observable… where the observation text does not include the entire information of the environment… The objective for the agent is to maximize the expected discounted reward E”);
and integrating a logical optimal action (LOA) computing framework with the RL computing environment, wherein the LOA computing framework comprises (Page 1 Under the Figure Caption of the Introduction, “In order to train logical rules, a recent neuro-symbolic framework called the Logical Neural Network (LNN) (Riegel et al., 2020) has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss.”, Page 2 Paragraph 2 of Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”
The specification of the instant case states that the logical optimal action refers to neuro-symbolic LNN that is capable of optimizing the LNN. In this case, the reference teaches using a neuro-symbolic framework using Logical Neural Network integrated with the reinforced learning environment.)
(ii) constraining rules that limit the scope of the information in the at least one dataset (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Pg. 1 Under Figure 1 Caption In Introduction Section, “…the Logical Neural Network (LNN)… has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss. Every neuron in the LNN has a component for a formula of weighted real-valued logics from a unique logical conjunction, disjunction, or negation nodes… the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts…"
The "constraining rules that limit the scope of the information in the at least one dataset" correspond to the symbolic logical rules used in the Logical Neural network (LNN). The symbolic logical rules are the parsed observation from the FOL converter that is used in the LNN. The rules are constrained to logical rules only, thus limiting the scope of the information in the dataset based on the effect of the FOL converter.),
and (2) a logical neural network (LNN) that is trained on the at least one dataset and the constraining rules and establishes LNN policy rules based on the training (Page 1 of Introduction Under Figure 1 Caption, “The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss… At the same time, the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy.”),
wherein the at least one dataset and the LNN policy rules are input into the RL computing environment where the RL agent makes decisions on the information in the at least one dataset that comply with the LNN policy rules and the RL agent is rewarded for decisions that advance the goal (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts… the agent understands an opened direction from the current room. The agent then retrieves the class type c of the word meaning in propositional logic l_i,t by using ConceptNet… or the network of another word’s definition.”, Page 3 Paragraph 1 of Section 3.2.2, “The LNN training component is for obtaining an action policy from the given FOL logics.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
The reference teaches that observation data from the environment, in the form of natural language text, constitutes a dataset that is processed and converted into first-order logical facts and input into a Logical Neural Network (LNN), as shown in Figure 1 and described in Section 3.2. The LNN learns logical policy rules that are input into the reinforcement learning computing environment and directly governs the RL agent’s action selection. The RL agent makes decisions that comply with the LNN policy rules and is rewarded based on the outcome of those decisions.).
Kimura does not teach a system, an interface for entry of (i) at least one dataset comprising information relating to existing polymers, and a dataset relating to the discovery of new polymer materials.
Ristoski, in the same field of endeavor, teaches relating to the discovery of new polymer materials (Paragraph 0080, “FIG. 8 is a diagram illustrating a method of polymer discovery in one embodiment… At 802, the method can include generating a set of material candidates expected to yield materials with target properties. Examples of material candidates include, but not limited to, molecules, monomers, and/or polymer repeat units.”)
(1) an interface for entry of (i) at least one dataset comprising information relating to existing polymers (Paragraph 0091, “For instance, at least one hardware processor 1102 may receive training data set which can include candidate material for polymerization, identify one or more desired features in the candidate material, and train a machine learning model to generate a new material having one or more of the desired features.”, Paragraph 0088, “In an embodiment, e.g., optionally, the identified desired features can be presented or caused to be presented on a user interface, e.g., for user interaction or view. In an embodiment, a user (e.g., an SME) may select from the desired features, a desired feature to include in the new material being generated.”
The reference teaches an interface or GUI that allows for the user to select desired features, which is considered an entry due to the fact that the user is entering constraints into the model.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s teaching of integrating a neuro-symbolic Logical Neural Network (LNN) framework with a reinforcement learning environment for decision making with Ristoski’s teaching of providing a user interface for entering polymer related constraints in order to apply the reinforcement learning and LNN to the discovery of new polymer materials, thereby improving the efficiency and guidance of polymer discovery (Paragraph 0002 of Ristoski).
Claim 11 is a method claim that recites identical limitations to method claim 2. Therefore, claim 11 is rejected using the same rationale as claim 2.
Claim 12 is a method claim that recites identical limitations to method claim 3. Therefore, claim 12 is rejected using the same rationale as claim 3.
Claim 13 is a method claim that recites identical limitations to method claim 4. Therefore, claim 13 is rejected using the same rationale as claim 4.
Regarding claim 19,
Kimura teaches for establishing a reinforcement learning (RL) computing environment comprising an RL algorithm acting as a reward-based RL agent programmed to make decisions in furtherance of a goal… (Pg. 2 Paragraph 1 of Section 3.1, “As text-based games are sequential decision-making problems, they can naturally be applied to RL. These games are partially observable… where the observation text does not include the entire information of the environment… The objective for the agent is to maximize the expected discounted reward E”);
establishing a logical optimal action (LOA) computing framework integrated with the RL computing environment, wherein the LOA computing framework comprises (Page 1 Under the Figure Caption of the Introduction, “In order to train logical rules, a recent neuro-symbolic framework called the Logical Neural Network (LNN) (Riegel et al., 2020) has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss.”, Page 2 Paragraph 2 of Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”
The specification of the instant case states that the logical optimal action refers to neuro-symbolic LNN that is capable of optimizing the LNN. In this case, the reference teaches using a neuro-symbolic framework using Logical Neural Network integrated with the reinforced learning environment.)
(ii) constraining rules that limit the scope of the information in the at least one dataset (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Pg. 1 Under Figure 1 Caption In Introduction Section, “…the Logical Neural Network (LNN)… has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss. Every neuron in the LNN has a component for a formula of weighted real-valued logics from a unique logical conjunction, disjunction, or negation nodes… the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts…"
The "constraining rules that limit the scope of the information in the at least one dataset" correspond to the symbolic logical rules used in the Logical Neural network (LNN). The symbolic logical rules are the parsed observation from the FOL converter that is used in the LNN. The rules are constrained to logical rules only, thus limiting the scope of the information in the dataset based on the effect of the FOL converter.),
and (2) a logical neural network (LNN) that is trained on the at least one dataset and the constraining rules and establishes LNN policy rules based on the training (Page 1 of Introduction Under Figure 1 Caption, “The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss… At the same time, the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
Here, the dataset corresponds to the observation in the Figure, the constraining rules correspond to the symbolic logical rules generated from the FOL converter, and the LNN policy rules corresponds to the output of the logical neural network which provides the rules used for the agent to make decisions. The observation and the rules are fed into the LNN to generate the policy rules for the agent to perform the actions.),
wherein the at least one dataset and the LNN policy rules are input into the RL computing environment where the RL agent makes decisions on the information in the at least one dataset that comply with the LNN policy rules and the RL agent is rewarded for decisions that advance the goal (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts… the agent understands an opened direction from the current room. The agent then retrieves the class type c of the word meaning in propositional logic l_i,t by using ConceptNet… or the network of another word’s definition.”, Page 3 Paragraph 1 of Section 3.2.2, “The LNN training component is for obtaining an action policy from the given FOL logics.”, See Figure 1,
The reference teaches that observation data from the environment, in the form of natural language text, constitutes a dataset that is processed and converted into first-order logical facts and input into a Logical Neural Network (LNN), as shown in Figure 1 and described in Section 3.2. The LNN learns logical policy rules that are input into the reinforcement learning computing environment and directly governs the RL agent’s action selection. The RL agent makes decisions that comply with the LNN policy rules and is rewarded based on the outcome of those decisions.).
Kimura does not teach a computer program product for discovery of polymers comprising: program instructions on or more computer readable storage media…, relating to the discovery of new polymer materials…, and program instructions on or more computer readable storage media for…, and (1) an interface for entry of (i) at least one dataset comprising information relating to existing polymers.
Ristoski, in the same field of endeavor, teaches [a] computer program product for discovery of polymers comprising: program instructions on or more computer readable storage media… (Paragraph 0012, “A computer readable storage medium storing a program of instructions executable by a machine to perform one or more methods described herein also may be provided.”, Paragraph 0021, “FIG. 8 is a diagram illustrating a method of polymer discovery in an embodiment.”)
relating to the discovery of new polymer materials (Paragraph 0080, “FIG. 8 is a diagram illustrating a method of polymer discovery in one embodiment… At 802, the method can include generating a set of material candidates expected to yield materials with target properties. Examples of material candidates include, but not limited to, molecules, monomers, and/or polymer repeat units.”)
and program instructions on or more computer readable storage media for… (Paragraph 0012, “A computer readable storage medium storing a program of instructions executable by a machine to perform one or more methods described herein also may be provided.”)
(1) an interface for entry of (i) at least one dataset comprising information relating to existing polymers (Paragraph 0091, “For instance, at least one hardware processor 1102 may receive training data set which can include candidate material for polymerization, identify one or more desired features in the candidate material, and train a machine learning model to generate a new material having one or more of the desired features.”, Paragraph 0088, “In an embodiment, e.g., optionally, the identified desired features can be presented or caused to be presented on a user interface, e.g., for user interaction or view. In an embodiment, a user (e.g., an SME) may select from the desired features, a desired feature to include in the new material being generated.”
The reference teaches an interface or GUI that allows for the user to select desired features, which is considered an entry due to the fact that the user is entering constraints into the model.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s teaching of integrating a neuro-symbolic Logical Neural Network (LNN) framework with a reinforcement learning environment for decision making with Ristoski’s teaching of providing a user interface for entering polymer related constraints used in a computer program in order to apply the reinforcement learning and LNN to the discovery of new polymer materials in a computer program, thereby improving the efficiency and guidance of polymer discovery (Paragraph 0002 of Ristoski).
Claim 20 is a manufacture claim that recites identical limitations to method claim 2. Therefore, claim 20 is rejected using the same rationale as claim 2.
Claim 21 is a manufacture claim that recites identical limitations to method claim 3. Therefore, claim 21 is rejected using the same rationale as claim 3.
Claim 22 is a manufacture claim that recites identical limitations to method claim 4. Therefore, claim 22 is rejected using the same rationale as claim 4.
Claims 5-9 and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over Kimura (“Neuro-Symbolic Reinforcement Learning with First-Order Logic”, 2021) in view of Ristoski (“US 20210303762 A1”) and Triplet (“US 20240007504 A1”).
Regarding claim 5,
Kimura does not teach a subject matter expert (SME) establishes the constraining rules and reviews the decisions by the RL agent to eliminate decisions that do not advance the goal.
Ristoski, in the same field of endeavor, teaches a subject matter expert (SME)… reviews the decisions by the RL agent to eliminate decisions that do not advance the goal (Paragraph 0002 of Ristoski, “Approaches have emerged, such as computational screening, inverse design, generative modeling, reinforcement learning as ways to accelerate the design of polymer materials. The drawback of these approaches is that they generate a large number of candidates for new molecules, which then need to be manually reviewed by subject matter experts who select only a dozen for further investigation.”, Paragraph 0063, “SeqGAN is a Sequence Generative Adversarial Network using reinforcement learning. The system includes generator and discriminator, where the generator is treated as reinforcement learning agent… the generated tokens represent the state and the action is the next token to be generated.”, Paragraph 0004, “Subject matter experts (SMEs) such as polymer chemists and synthetic organic chemists review many pages of candidates and select the ones that appear viable for further testing. With the very large number of candidates, and the fact that only a few will be selected for experiment manually”
The reinforcement learning agent generates material candidates and the outcome of the agent is used to have a SME review the decisions of the agent, and selects certain candidates for further testing).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s reinforcement learning system employing neuro-symbolic policy reasoning with Ristoski’s teaching of subject matter expert review of generated material candidates from the reinforcement learning agent in order to apply Ristoski’s SME to evaluate and eliminate agent decisions that do not advance the goal, thus improving the quality and relevance of selected outcomes (Paragraph 0032 of Ristoski).
Kimura and Ristoski do not teach a subject matter expert (SME) establishes the constraining rules.
Triplet, in the same field of endeavor, teaches a subject matter expert (SME) establishes the constraining rules (Paragraph 0097 of Triplet, “The methods may detect compliance violations in real-time and predict remediation actions to non-compliant events, such as by using a) rules manually defined by SME to map each policy to a remediation template, b) collaborative filtering algorithms to predict which template can best remediate a given violation based on historical data, c) supervised ML to predict which template can best remediate a given violation, d) reinforcement learning algorithms for more complex scenarios, to predict a sequence of remediations actions on one or several devices, among others.”).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura and Ristoski’s teaching with Triplet’s teaching of SME defined constraining rules in order to allow an SME to establish policy constraints that guardrail machine learning decisions toward goal-advancing outcomes (Paragraph 0002 of Triplet).
Regarding claim 6,
Kimura teaches a reinforcement learning (RL) computing environment comprising an RL algorithm acting as a reward-based RL agent programmed to make decisions in furtherance of a goal… (Pg. 2 Paragraph 1 of Section 3.1, “As text-based games are sequential decision-making problems, they can naturally be applied to RL. These games are partially observable… where the observation text does not include the entire information of the environment… The objective for the agent is to maximize the expected discounted reward E”);
and a logical optimal action (LOA) computing framework (neuro-symbolic framework) integrated with the RL computing environment, wherein the LOA computing framework comprises (Page 1 Under the Figure Caption of the Introduction, “In order to train logical rules, a recent neuro-symbolic framework called the Logical Neural Network (LNN) (Riegel et al., 2020) has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss.”, Page 2 Paragraph 2 of Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”
The specification of the instant case states that the logical optimal action refers to neuro-symbolic LNN that is capable of optimizing the LNN. In this case, the reference teaches using a neuro-symbolic framework using Logical Neural Network integrated with the reinforced learning environment.)
and (2) a logical neural network (LNN) that is trained on the at least one dataset… and establishes LNN policy rules based on the training (Page 1 of Introduction Under Figure 1 Caption, “The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss… At the same time, the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
Here, the dataset corresponds to the observation in the Figure, the constraining rules correspond to the symbolic logical rules generated from the FOL converter, and the LNN policy rules corresponds to the output of the logical neural network which provides the rules used for the agent to make decisions. The observation and the rules are fed into the LNN to generate the policy rules for the agent to perform the actions.),
wherein the at least one dataset and the policy rules are input into the RL computing environment where the RL agent makes decisions on the information in the at least one dataset that comply with the LNN policy rules and the RL agent is rewarded for decisions that advance the goal (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts… the agent understands an opened direction from the current room. The agent then retrieves the class type c of the word meaning in propositional logic l_i,t by using ConceptNet… or the network of another word’s definition.”, Page 3 Paragraph 1 of Section 3.2.2, “The LNN training component is for obtaining an action policy from the given FOL logics.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
The reference teaches that observation data from the environment, in the form of natural language text, constitutes a dataset that is processed and converted into first-order logical facts and input into a Logical Neural Network (LNN), as shown in Figure 1 and described in Section 3.2. The LNN learns logical policy rules that are input into the reinforcement learning computing environment and directly governs the RL agent’s action selection. The RL agent makes decisions that comply with the LNN policy rules and is rewarded based on the outcome of those decisions.).
Kimura does not teach a system, an interface for entry of (i) at least one dataset comprising information relating to existing polymers, (ii) rules… that constrain the scope of the at least one dataset to available materials and laboratory equipment, a reward-based RL agent programmed to make decisions in furtherance of a goal relating to the discovery of new polymer materials, rules defined by a subject matter expert (SME), and train[ing] on… the SME-defined constraining rules.
Ristoski, in the same field of endeavor, teaches [a] system comprising (Paragraph 0005, “A computer-implemented method and system to accelerate new polymer can be provided. The method and system can accelerate new polymer discovery.”)
(1) an interface for entry of (i) at least one dataset comprising information relating to existing polymers (Paragraph 0091, “For instance, at least one hardware processor 1102 may receive training data set which can include candidate material for polymerization, identify one or more desired features in the candidate material, and train a machine learning model to generate a new material having one or more of the desired features.”, Paragraph 0088, “In an embodiment, e.g., optionally, the identified desired features can be presented or caused to be presented on a user interface, e.g., for user interaction or view. In an embodiment, a user (e.g., an SME) may select from the desired features, a desired feature to include in the new material being generated.”
The reference teaches an interface or GUI that allows for the user to select desired features, which is considered an entry due to the fact that the user is entering constraints into the model.)
(ii) rules… that constrain the scope of the at least one dataset to available materials and laboratory equipment (Paragraph 0042, “The general model tries to learn general rules for synthesizing new materials that are shared across all labs and SMEs, e.g., each monomer suitable for polymerization must contains a polymerizable group and should not contain pendant groups that will be active under polymerization conditions.”, Paragraph 0050, “By way of example only, a training data set to train or build the classification model 106 can include a random sample from a combinatorial library of polymerizable components suitable for the preparation of polyimides, including dianhydrides, dicarboxylic acids, and diamines.”),
a reward-based RL agent programmed to make decisions in furtherance of a goal relating to the discovery of new polymer materials (Paragraph 0063 of Ristoski, “SeqGAN is a Sequence Generative Adversarial Network using reinforcement learning. The system includes generator and discriminator, where the generator is treated as reinforcement learning agent. In such a scenario, the generated tokens represent the state and the action is the next token to be generated… The reward can be calculated by the discriminator on a complete sequence via Monte Carlo search.”, Paragraph 0002, “Approaches have emerged, such as computational screening, inverse design, generative modeling, reinforcement learning as ways to accelerate the design of polymer materials.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s teaching of integrating a neuro-symbolic Logical Neural Network (LNN) framework with a reinforcement learning environment for decision making with Ristoski’s teaching of providing a user interface for entering polymer related constraints in a system in order to apply the reinforcement learning and LNN to the discovery of new polymer materials, thereby improving the efficiency and guidance of polymer discovery (Paragraph 0002 of Ristoski).
Kimura and Ristoski do not teach rules defined by a subject matter expert (SME) and train[ing] on… the SME-defined constraining rules.
Triplet, in the same field of endeavor, teaches rules defined by a subject matter expert (SME) (Paragraph 0097 of Triplet, “The methods may detect compliance violations in real-time and predict remediation actions to non-compliant events, such as by using a) rules manually defined by SME to map each policy to a remediation template, b) collaborative filtering algorithms to predict which template can best remediate a given violation based on historical data, c) supervised ML to predict which template can best remediate a given violation, d) reinforcement learning algorithms for more complex scenarios, to predict a sequence of remediations actions on one or several devices, among others.”)
train[ing] on… the SME-defined constraining rules (Paragraph 0097, “The methods may detect compliance violations in real-time and predict remediation actions to non-compliant events, such as by using a) rules manually defined by SME to map each policy to a remediation template… c) supervised ML to predict which template can best remediate a given violation, d) reinforcement learning algorithms for more complex scenarios, to predict a sequence of remediations actions on one or several devices, among others.”, Paragraph 0041, “In some embodiments, AI may be used to discover these rules and policies and/or generate the rules to improve some area of rules compliance. Configuring policies and rules may be part of an ML training process.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura and Ristoski’s teaching with Triplet’s teaching of SME defined constraining rules in order to allow an SME to establish policy constraints that guardrail machine learning decisions toward goal-advancing outcomes (Paragraph 0002 of Triplet).
Claim 7 is an apparatus claim that recites identical limitations to method claim 3. Therefore, claim 7 is rejected using the same rationale as claim 3.
Claim 8 is an apparatus claim that recites identical limitations to method claim 4. Therefore, claim 8 is rejected using the same rationale as claim 4.
Regarding claim 9,
Kimura does not teach the SME reviews the decisions by the RL agent to eliminate decisions that do not advance the goal.
Ristoski, in the same field of endeavor, teaches the SME reviews the decisions by the RL agent to eliminate decisions that do not advance the goal (Paragraph 0002 of Ristoski, “Approaches have emerged, such as computational screening, inverse design, generative modeling, reinforcement learning as ways to accelerate the design of polymer materials. The drawback of these approaches is that they generate a large number of candidates for new molecules, which then need to be manually reviewed by subject matter experts who select only a dozen for further investigation.”, Paragraph 0063, “SeqGAN is a Sequence Generative Adversarial Network using reinforcement learning. The system includes generator and discriminator, where the generator is treated as reinforcement learning agent… the generated tokens represent the state and the action is the next token to be generated.”, Paragraph 0004, “Subject matter experts (SMEs) such as polymer chemists and synthetic organic chemists review many pages of candidates and select the ones that appear viable for further testing. With the very large number of candidates, and the fact that only a few will be selected for experiment manually”
The reinforcement learning agent generates material candidates and the outcome of the agent is used to have a SME review the decisions of the agent, and selects certain candidates for further testing).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s reinforcement learning system employing neuro-symbolic policy reasoning with Ristoski’s teaching of subject matter expert review of generated material candidates from the reinforcement learning agent in order to apply Ristoski’s SME to evaluate and eliminate agent decisions that do not advance the goal, thus improving the quality and relevance of selected outcomes (Paragraph 0032 of Ristoski).
Claim 14 is a method claim that recites identical limitations to method claim 5. Therefore, claim 14 is rejected using the same rationale as claim 5.
Regarding claim 15,
Kimura teaches [a] computer-implemented method comprising (Page 2 Paragraph 2 in Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”):
programming a reinforcement learning (RL) algorithm to act as a reward-based RL agent within an RL computing environment (See Algorithm 1 RL by FOL-LNN,
PNG
media_image4.png
419
389
media_image4.png
Greyscale
),
wherein the RL agent makes decisions in furtherance of a goal… (Pg. 2 Paragraph 1 of Section 3.1, “As text-based games are sequential decision-making problems, they can naturally be applied to RL. These games are partially observable… where the observation text does not include the entire information of the environment… The objective for the agent is to maximize the expected discounted reward E”);
and integrating a logical optimal action (LOA) computing framework with the RL computing environment, wherein the LOA computing framework comprises (Page 1 Under the Figure Caption of the Introduction, “In order to train logical rules, a recent neuro-symbolic framework called the Logical Neural Network (LNN) (Riegel et al., 2020) has been proposed to simultaneously provide key properties of both the neural network (learning) and the symbolic logic (reasoning). The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss.”, Page 2 Paragraph 2 of Introduction, “In this paper, we propose an action knowledge acquisition method featuring a neuro-symbolic LNN framework for the RL algorithm.”
The specification of the instant case states that the logical optimal action refers to neuro-symbolic LNN that is capable of optimizing the LNN. In this case, the reference teaches using a neuro-symbolic framework using Logical Neural Network integrated with the reinforced learning environment.)
and (2) a logical neural network (LNN) that is trained on the at least one dataset… and establishes LNN policy rules based on the training (Page 1 of Introduction Under Figure 1 Caption, “The LNN can train the symbolic rules with logical functions in the neural networks by having an end-to-end differentiable network minimizes a contradiction loss… At the same time, the trained LNN can extract obtained logical rules by selecting high weighted connections that represent the important rules for an action policy.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
Here, the dataset corresponds to the observation in the Figure, the constraining rules correspond to the symbolic logical rules generated from the FOL converter, and the LNN policy rules corresponds to the output of the logical neural network which provides the rules used for the agent to make decisions. The observation and the rules are fed into the LNN to generate the policy rules for the agent to perform the actions.),
wherein the at least one dataset and the policy rules are input into the RL computing environment where the RL agent makes decisions on the information in the at least one dataset that comply with the LNN policy rules and the RL agent is rewarded for decisions that advance the goal (Page 5 Paragraph 1 Under Discussion about ethics, “The data set used in our experiment does not contain any sensitive information.”, Page 3 of Section 3.2.1, “The FOL converter converts a given natural observation text ot and observation history (ot−1, ot−2, ...) into first-order logic facts… the agent understands an opened direction from the current room. The agent then retrieves the class type c of the word meaning in propositional logic l_i,t by using ConceptNet… or the network of another word’s definition.”, Page 3 Paragraph 1 of Section 3.2.2, “The LNN training component is for obtaining an action policy from the given FOL logics.”, See Figure 1,
PNG
media_image1.png
418
423
media_image1.png
Greyscale
The reference teaches that observation data from the environment, in the form of natural language text, constitutes a dataset that is processed and converted into first-order logical facts and input into a Logical Neural Network (LNN), as shown in Figure 1 and described in Section 3.2. The LNN learns logical policy rules that are input into the reinforcement learning computing environment and directly governs the RL agent’s action selection. The RL agent makes decisions that comply with the LNN policy rules and is rewarded based on the outcome of those decisions.).
Kimura does not teach relating to the discovery of new polymer materials, an interface for entry of (i) at least one dataset comprising information relating to existing polymers, rules… that constrain the scope of the at least one dataset to available materials and laboratory equipment, rules defined by a subject matter expert (SME), and train[ing] on… the SME-defined constraining rules.
Ristoski, in the same field of endeavor, teaches relating to the discovery of new polymer materials (Paragraph 0080, “FIG. 8 is a diagram illustrating a method of polymer discovery in one embodiment… At 802, the method can include generating a set of material candidates expected to yield materials with target properties. Examples of material candidates include, but not limited to, molecules, monomers, and/or polymer repeat units.”)
(1) an interface for entry of (i) at least one dataset comprising information relating to existing polymers (Paragraph 0091, “For instance, at least one hardware processor 1102 may receive training data set which can include candidate material for polymerization, identify one or more desired features in the candidate material, and train a machine learning model to generate a new material having one or more of the desired features.”, Paragraph 0088, “In an embodiment, e.g., optionally, the identified desired features can be presented or caused to be presented on a user interface, e.g., for user interaction or view. In an embodiment, a user (e.g., an SME) may select from the desired features, a desired feature to include in the new material being generated.”
The reference teaches an interface or GUI that allows for the user to select desired features, which is considered an entry due to the fact that the user is entering constraints into the model.),
and (ii) rules… that constrain the scope of the at least one dataset to available materials and laboratory equipment (Paragraph 0042, “The general model tries to learn general rules for synthesizing new materials that are shared across all labs and SMEs, e.g., each monomer suitable for polymerization must contains a polymerizable group and should not contain pendant groups that will be active under polymerization conditions.”, Paragraph 0050, “By way of example only, a training data set to train or build the classification model 106 can include a random sample from a combinatorial library of polymerizable components suitable for the preparation of polyimides, including dianhydrides, dicarboxylic acids, and diamines.”),
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura’s teaching of integrating a neuro-symbolic Logical Neural Network (LNN) framework with a reinforcement learning environment for decision making with Ristoski’s teaching of providing a user interface for entering polymer related constraints in order to apply the reinforcement learning and LNN to the discovery of new polymer materials, thereby improving the efficiency and guidance of polymer discovery (Paragraph 0002 of Ristoski).
Kimura and Ristoski do not teach rules defined by a subject matter expert (SME) and train[ing] on… the SME-defined constraining rules.
Triplet, in the same field of endeavor, teaches rules defined by a subject matter expert (SME) (Paragraph 0097 of Triplet, “The methods may detect compliance violations in real-time and predict remediation actions to non-compliant events, such as by using a) rules manually defined by SME to map each policy to a remediation template, b) collaborative filtering algorithms to predict which template can best remediate a given violation based on historical data, c) supervised ML to predict which template can best remediate a given violation, d) reinforcement learning algorithms for more complex scenarios, to predict a sequence of remediations actions on one or several devices, among others.”)
train[ing] on… the SME-defined constraining rules (Paragraph 0097, “The methods may detect compliance violations in real-time and predict remediation actions to non-compliant events, such as by using a) rules manually defined by SME to map each policy to a remediation template… c) supervised ML to predict which template can best remediate a given violation, d) reinforcement learning algorithms for more complex scenarios, to predict a sequence of remediations actions on one or several devices, among others.”, Paragraph 0041, “In some embodiments, AI may be used to discover these rules and policies and/or generate the rules to improve some area of rules compliance. Configuring policies and rules may be part of an ML training process.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine Kimura and Ristoski’s teaching with Triplet’s teaching of SME defined constraining rules in order to allow an SME to establish policy constraints that guardrail machine learning decisions toward goal-advancing outcomes (Paragraph 0002 of Triplet).
Claim 16 is a method claim that recites identical limitations to method claim 7. Therefore, claim 16 is rejected using the same rationale as claim 7.
Claim 17 is a method claim that recites identical limitations to method claim 8. Therefore, claim 17 is rejected using the same rationale as claim 8.
Claim 18 is a method claim that recites identical limitations to method claim 9. Therefore, claim 18 is rejected using the same rationale as claim 9.
Claim 23 is a manufacture claim that recites identical limitations to method claim 5. Therefore, claim 23 is rejected using the same rationale as claim 5.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAJD MAHER HADDAD whose telephone number is (571)272-2265. The examiner can normally be reached Mon-Friday 8-5 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/M.M.H./Examiner, Art Unit 2125
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125