Last updated: April 19, 2026
Application No. 18/360,633
TEXTUALLY GUIDED CONSTRAINED POLICY OPTIMIZATION FOR SAFE REINFORCEMENT LEARNING

Non-Final OA §101§103
Filed
Jul 27, 2023
Examiner
SOMERS, MARC S
Art Unit
2159
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
This examiner grants 65% of cases after interview

— +34.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 563 resolved cases, 2023–2026
Examiner Intelligence

SOMERS, MARC S View full profile →
Grants 65% of resolved cases
Career Allow Rate
364 granted / 563 resolved
+9.7% vs TC avg
Strong +35% interview lift
Without
With
+34.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
36 currently pending
Career history
599
Total Applications
across all art units
Statute-Specific Performance

§101
18.0%
-22.0% vs TC avg
§103
47.9%
+7.9% vs TC avg
§102
10.1%
-29.9% vs TC avg
§112
15.1%
-24.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 563 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because it contains an embedded hyperlink and/or other form of browser-executable code. Applicant is required to delete the embedded hyperlink and/or other form of browser-executable code; references to websites should be limited to the top-level domain name without any prefix such as http:// or other browser-executable code. See MPEP § 608.01.

Information Disclosure Statement
The information disclosure statement filed 7/27/2023 fails to comply with 37 CFR 1.98(a)(2), which requires a legible copy of each cited foreign patent document; each non-patent literature publication or that portion which caused it to be listed; and all other information or that portion which caused it to be listed.  It has been placed in the application file, but the information referred to therein has not been considered.  In particular, the listing for Foreign Patent Document 113283167 does not appear to have a corresponding copy.
The information disclosure statement filed 7/27/2023 fails to comply with 37 CFR 1.98(b)(5), which requires “[e]ach publication listed in an information disclosure statement must be identified by publisher, author (if any), title, relevant pages of the publication, date, and place of publication” (emphasis added).  All of the NPL citations (NPL reference 1 through NPL reference 8) are only listed with title and none of the other required information.  It has been placed in the application file, but the information referred to therein has not been considered.  

Claim Objections
Claim 8 is objected to because of the following informalities: the phrase “the safety system comprising; “ in the second limitation of the claim utilizes a semi-colon after comprising where it is believed that a colon (‘:’) should be used.  Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to s without significantly more. 

With regard to claim 1:
Step 2A, Prong One:
	The claim recites the following limitations which are drawn towards an abstract idea:
the method comprising: 
based on the safety hints, using a dynamic constraint cost function for determining a constraint cost on actions taken by the RL agent in the environment (recites mental process steps of evaluation, including weighting or mathematical calculation, of the cost/benefit of doing one action over another); 
and operating the RL agent, using the safety hints and constraint cost, to determine an action to take (recites mental process steps of evaluation and judgement for determining what to do in the current situation).

	As seen from above, the identified limitations recite concepts associated with an abstract idea and thus the respective claim recites a judicial exception (see 2106.04(a)) and thus requires further analysis as discussed below.

Step 2A, Prong Two:
	The following limitations have been identified as being additional elements as discussed below.
	A computer-implemented method of increasing safety of a Reinforcement Learning (RL) agent operating with a text-based environment with safety constraints (recites field of use limitations of using a computer to implement the abstract idea, see MPEP 2106.05(f)), 
obtaining safety hints from analysis of a textual model of the environment (recites insignificant extrasolution activity of data gathering such as receiving data over a network, see MPEP 2106.05(g));

As seen from the above discussion, the identified limitations did not integrate the judicial exception into a practical application (see MPEP 2106.04(d)).  This judicial exception is not integrated into a practical application because the additional elements recite mere data gathering activities of obtaining information with respect to a generic computer environment. 

Step 2B:
	Below is the analysis of the claims:
A computer-implemented method of increasing safety of a Reinforcement Learning (RL) agent operating with a text-based environment with safety constraints (recites field of use limitations of using a computer to implement the abstract idea, see MPEP 2106.05(f)), 
obtaining safety hints from analysis of a textual model of the environment (recites well-understood, routine, and conventional activity of data gathering such as receiving data over a network, see MPEP 2106.05(d));

	As seen from above, the respective claim elements taken individually do not amount to significantly more than the judicial exception.  When taken as a whole (in combination), the claim also does not amount to significantly more than the abstract idea because the additional elements recite mere data gathering activities of obtaining information with respect to a generic computer environment.

With regard to claim 2, this claim recites  wherein the RL agent performs a line search governed by the safety hints and constraint cost to determine an action to take (which recites mental process steps of evaluation/comparison of information).

With regard to claim 3, this claim recites based on a result of the action, updating the textual model and the RL agent (recites mental process steps of evaluation what occurred based on the action taken and determining what the current state/environment is in order to make a new decision).

With regard to claim 4, this claim recites obtaining the safety hints from the textual model using a safety concept net and semantic similarities (recites insignificant extrasolution activity of receiving information which amounts to well-understood, routine, and conventional activity of receiving data over a network, see MPEP 2106.05(d)).

With regard to claim 5, this claim recites wherein obtaining the safety hints comprises: generating a Safety Concept Net Graph (SCNG) data structure (recites field of use limitations describing, at a high-level of generality, a particular data structure to be used, see MPEP 2106.05(f)) using generic safety knowledge of entities and expected safety interactions in the environment (recites insignificant extrasolution activity of storing information in memory which amounts to well-understood, routine, and conventional activity of storing information in memory, see MPEP 2106.05(d)); 
based on current state information of the environment, extract safety entities of interest using the SCNG (recites mental process steps of evaluating and selective analysis of particular information for further consideration); 
determine if a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG (recites mental process step of evaluation and comparison of sets of information); 
when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generate a corresponding safety hint (recites mental process steps of evaluation and judgement for determining hints/suggestions based on current observations). 

With regard to claim 6, this claim recites wherein, when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generating a corresponding safety hint by: finding a lemma form of an antonym for the semantically close node or edge; and construct the corresponding safety hint based on the antonym  (recites mental process step of evaluation and analysis of information including language parsing and understanding such as lemma form of an antonym).

With regard to claim 7, this claim recites updating a safety hint action list with all semantically close available actions in the current state of the environment (recites insignificant extrasolution activity of sorting information which amounts to well-understood, routine, and conventional activity of sorting information, see MPEP 2106.05(d)).


With regard to claim 8:
Step 2A, Prong One:
	The claim recites the following limitations which are drawn towards an abstract idea:
a safety hint generator for generating safety hints based on the safety concept net and a text model of the operating environment (recites mental process steps of evaluation and analysis of an environment to determine entities/objects that are potentially unsafe or hazardous, similar to perception checks in tabletop role-playing games), 
a dynamic constraint cost calculator to determine a constraint cost based on the safety hints (recites mental process steps of evaluation, including weighting or mathematical calculation, of the cost/benefit of doing one action over another), 

	As seen from above, the identified limitations recite concepts associated with an abstract idea and thus the respective claim recites a judicial exception (see 2106.04(a)) and thus requires further analysis as discussed below.

Step 2A, Prong Two:
	The following limitations have been identified as being additional elements as discussed below.
	
A Reinforcement Learning (RL) system, comprising: an RL agent comprising a deep neural network, the RL agent for performing a task in an operating environment based on a policy optimized through trial and error (recites utilizing a computer as a tool to implement the judicial exception, see MPEP 2106.05(f)); 
and a safety system for increasing safety of the RL agent based on specified constraints (recites generic computer elements at a high-level of generality to implement the judicial exception, see MPEP 2106.05(f)), 
the safety system comprising; a safety concept net for entities in the operating environment (recites insignificant extrasolution activity of storing information in memory, see MPEP 2106.05(g)), 
wherein the safety system updates the RL agent based on the safety hints and constraint cost (recites insignificant extrasolution activity of transmitting information and storing information, see MPEP 2106.05(g)).

As seen from the above discussion, the identified limitations did not integrate the judicial exception into a practical application (see MPEP 2106.04(d)).  This judicial exception is not integrated into a practical application because the additional elements recite at a high-level of generality the usage of a computer as a tool to implement the judicial exception with various additional elements discussing generic functionality associated with storage and transmitting information. 

Step 2B:
	Below is the analysis of the claims:
A Reinforcement Learning (RL) system, comprising: an RL agent comprising a deep neural network, the RL agent for performing a task in an operating environment based on a policy optimized through trial and error (recites utilizing a computer as a tool to implement the judicial exception, see MPEP 2106.05(f)); 
and a safety system for increasing safety of the RL agent based on specified constraints (recites generic computer elements at a high-level of generality to implement the judicial exception, see MPEP 2106.05(f)), 
the safety system comprising; a safety concept net for entities in the operating environment (recites well-understood, routine, and conventional activity of storing information in memory, see MPEP 2106.05(d)), 
wherein the safety system updates the RL agent based on the safety hints and constraint cost (recites well-understood, routine, and conventional activity of transmitting information and storing information, see MPEP 2106.05(d)).

	As seen from above, the respective claim elements taken individually do not amount to significantly more than the judicial exception.  When taken as a whole (in combination), the claim also does not amount to significantly more than the abstract idea recite at a high-level of generality the usage of a computer as a tool to implement the judicial exception with various additional elements discussing generic functionality associated with storage and transmitting information.

With regard to claim 9, this claim recites wherein the specified constraints are in text form (recites field of use limitations describing the format of the data, see MPEP 2106.05(h)).

With regard to claim 10, this claim recites wherein the operating environment is textual (recites field of use limitations describing the format of the data, see MPEP 2106.05(h)).

With regard to claim 11, this claim recites wherein the safety hint generator comprises a semantic analyzer to generate the safety hints based on the safety concept net and a text model of the operating environment (recites mental process steps of analyzing information and forming judgements (safety hints)).

With regard to claim 12,  this claim recites wherein: the safety concept net comprises a Safety Concept Net Graph (SCNG) (recites field of use limitations describing, at a high-level of generality, a particular data structure to be used, see MPEP 2106.05(f)); 
and the semantic analyzer determines semantic closeness between an entity attribute and any node or edge of the SCNG to generate a safety hint (recites mental process steps of evaluation and analysis to determine how related/similar various pieces of information are with each other).

With regards to claim 13, this claim recites wherein, when the closeness between the entity attribute and a node or edge of the SCNG is within a threshold, the semantic analyzer determines a lemma form of an antonym for the entity attribute to generate the safety hint (recites mental process step of evaluation and analysis of information including language parsing and understanding such as lemma form of an antonym).

With regard to claim 14, this claim recites wherein the semantic analyzer determines how semantically close the safety hint is to all current possible actions the RL agent may take to generate a safety hint action command (recites mental process steps of determining if a safety hint or any hint/suggestion is related to possible actions).

With regard to claim 15, this claim is substantially similar to claim 8 and is rejected for similar reasons as discussed above.

With regard to claims 16-20, these claims are substantially similar to claims 9 and 11-14 respectively and are rejected for similar reasons as discussed above.

Claims 8-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claim is directed towards software per se.  Claim 8 recites “an RL agent” and “a safety system” where the safety system comprises “a safety concept net”; “a safety hint generator”; and “a dynamic constraint cost calculator” in the body of the claim.  The applicant’s specification at paragraph [0001] indicates that an “agent” is a software program with paragraph [0062] indicating that instructions, when executed provide the safety system (i.e. instructions equate to program code).  Therefore, claim 8 is rejected for being directed towards software per se.  Claims 9-14 inherit the same deficiencies as claim 8 as discussed above and are rejected for similar reasons as discussed above.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Hoang et al, “SCERL: A Benchmark for intersecting language and Safe Reinforcement Learning” (from IDS) in view of Yang et al, “Safe Reinforcement Learning with Natural Language Constraints” (from IDS).
With regard to claim 1, Hoang teaches a computer-implemented method 
obtaining safety hints 
Hoang does not appear to explicitly teach:
obtaining safety hints from analysis of a textual model of the environment; 
based on the safety hints, using a dynamic constraint cost function for determining a constraint cost on actions taken by the RL agent in the environment; 
and operating the RL agent, using the safety hints and constraint cost, to determine an action to take.
Yang teaches obtaining safety hints from analysis of a textual model of the environment (see section 5.1 starting on page 5 through second paragraph on page 6; the system can obtain constraints including being able to determine the presence or absence of a constraint-related entity);
based on the safety hints, using a dynamic constraint cost function for determining a constraint cost on actions taken by the RL agent in the environment (see page 6, sections (1) and (2); the system can utilize the hints/masks as part of the input for determining a cost for an action).
operating the RL agent, using the safety hints 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang by utilizing information from current observations to determine what safety constraints could be applicable at the current state/time as taught by Yang in order to allow the system to simplify the reduce the respective constraints into a simplifier form that the respective system can easily analyze and utilize to determine what safety constraints are applicable to the current state of the environment.
Hoang in view of Yang teach operating the RL agent, using the safety hints and constraint cost, to determine an action to take (see Yang, page 6, section (2) Policy network; see Hoang, page 11, entire section 4.1; constraint costs or penalty scores can be utilized).

With regard to claim 3, Hoang in view of Yang teach based on a result of the action, updating the textual model and the RL agent (see Yang, sections (1) and (2) on page 6; the system can update the respective model and agent accordingly so that the model reflects the state after the agent performed an action with the respective agent as well as what the set of observable actions are in this new state).



Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Hoang et al, “SCERL: A Benchmark for intersecting language and Safe Reinforcement Learning” (from IDS) in view of Yang et al, “Safe Reinforcement Learning with Natural Language Constraints” (from IDS) in further view of Mannor et al [US 2022/0398283 A1].
With regard to claim 2, Hoang in view of Yang teach all the claim limitations of claim 1 as discussed above.
Hoang in view of Yang do not appear to explicitly teach:
wherein the RL agent performs a line search governed by the safety hints and constraint cost to determine an action to take.
Mannor teaches wherein the RL agent performs a line search governed by 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang by implementing a search means of future state/action predictions as taught by Mannor in order to help improve the overall benefit/reward for the system by not limiting the decision making to the immediate possible actions but rather be able to determine where the selection of an action can lead thus helping the system determine the best trajectory or path forward even if the immediate action may not be deemed the best individual choice.
Hoang in view of Yang in further view of Mannor teach wherein the RL agent performs a line search governed by the safety hints and constraint cost to determine an action to take (see Mannor, paragraph [0045]; see Yang, first paragraph on page 4 and section (2) on page 6; see Hoang, page 7 with respect to section 3.1 in Supplementary Materials; the system can receive various hints/constraints as well as associated costs and utilize that information to determine an action to take).



Claims 4, 8-11, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Hoang et al, “SCERL: A Benchmark for intersecting language and Safe Reinforcement Learning” (from IDS) in view of Yang et al, “Safe Reinforcement Learning with Natural Language Constraints” (from IDS) in further view of Murugesan et al, “Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines” (from IDS).
With regard to claim 4, Hoang in view of Yang teach all the claim limitations of claim 1 as discussed above.
Hoang in view of Yang do not appear to explicitly teach:
obtaining the safety hints from the textual model using a safety concept net and semantic similarities.
Murugesan teaches obtaining the 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang by utilizing a ConceptNet that can determine similarities or contextual connections associated with observed entities as taught by Murugesan in order to allow agents to be able to capitalize on commonsense knowledge about entities and their relationships with each other so that the agent can make decisions that are based on greater understanding of all the observed entities in the environment.
Hoang in view of Yang in further view of Murugesan teach obtaining the safety hints from the textual model using a safety concept net and semantic similarities (see Yang, section 5.1 starting on page 5 through second paragraph on page 6; and section 6.2;  see Hoang, see page 7 with respect to section 3.1 in Supplementary Materials; see Murugesan, section 3.3; the system can utilize a ConceptNet with respect to safety considerations for determining safety hints/warnings).



With regard to claim 8, Hoang teaches 
a Reinforcement Learning (RL) system, comprising: an RL agent 
and a safety system for increasing safety of the RL agent based on specified constraints (see page 7; safety constraints can be provided to the system to help increase the safety of the RL agent).
Hoang does not appear to explicitly teach:
an RL agent comprising a deep neural network,
the safety system comprising; a safety concept net for entities in the operating environment, 
a safety hint generator for generating safety hints based on the safety concept net and a text model of the operating environment, 
a dynamic constraint cost calculator to determine a constraint cost based on the safety hints, 
wherein the safety system updates the RL agent based on the safety hints and constraint cost.
Yang teaches an RL agent comprising a deep neural network (see section 5.1 on page 5; the agent comprises a deep neural network),
a safety hint generator for generating safety hints based through second paragraph on page 6; the system can obtain constraints including being able to determine the presence or absence of a constraint-related entity);
a dynamic constraint cost calculator to determine a constraint cost based on the safety hints (see page 6, sections (1) and (2); the system can utilize the hints/masks as part of the input for determining a cost for an action).
wherein the safety system updates the RL agent based on the safety hints and constraint cost (see Yang, sections (1) and (2) on page 6; the system can update the respective model and agent accordingly so that the model reflects the state after the agent performed an action with the respective agent as well as what the set of observable actions are in this new state).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang by utilizing information from current observations to determine what safety constraints could be applicable at the current state/time as taught by Yang in order to allow the system to simplify the reduce the respective constraints into a simplifier form that the respective system can easily analyze and utilize to determine what safety constraints are applicable to the current state of the environment.
Hoang in view of Yang do not appear to explicitly teach:
the safety system comprising; a safety concept net for entities in the operating environment, 
a safety hint generator for generating safety hints based on the safety concept net and a text model of the operating environment.
Murugesan teaches a 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang by utilizing a ConceptNet that can determine similarities or contextual connections associated with observed entities as taught by Murugesan in order to allow agents to be able to capitalize on commonsense knowledge about entities and their relationships with each other so that the agent can make decisions that are based on greater understanding of all the observed entities in the environment.
Hoang in view of Yang and Murugesan teach the safety system comprising; a safety concept net for entities in the operating environment, a safety hint generator for generating safety hints based on the safety concept net and a text model of the operating environment  (see Yang, section 5.1 starting on page 5 through second paragraph on page 6; and section 6.2;  see Hoang, see page 7 with respect to section 3.1 in Supplementary Materials & section 4.1 on page 3; see Murugesan, section 3.3 and section 4.1 Commonsense-based agents; the system can utilize a ConceptNet with respect to safety considerations for determining safety hints/warnings and be able to link entities contextually to determine expected interactions).

With regard to claim 9, Hoang in view of Yang and Murugesan teach wherein the specified constraints are in text form (see Hoang, section 3.1 on page 2; see Yang, section (1); the constraints can be written via text).

With regard to claim 10, Hoang in view of Yang and Murugesan teach wherein the operating environment is textual (see Hoang, Abstract and section 3.1 on page 2 and section 5 indicating the agent explores a text-based domain; see Murugesan, section 2; the environment is textual).

With regard to claim 11, Hoang in view of Yang and Murugesan teach wherein the safety hint generator comprises a semantic analyzer to generate the safety hints based on the safety concept net and a text model of the operating environment (see Yang, section 5.1 starting on page 5 through second paragraph on page 6; and section 6.2;  see Hoang, see page 7 with respect to section 3.1 in Supplementary Materials; see Murugesan, section 3.3; the system can utilize a ConceptNet with respect to safety considerations for determining safety hints/warnings).

With regard to claims 15-17, these claims are substantially similar to claims 8, 9 and 11 and are rejected for similar reasons as discussed above.



Claims 5, 7, 12, 14, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Hoang et al, “SCERL: A Benchmark for intersecting language and Safe Reinforcement Learning” (from IDS) in view of Yang et al, “Safe Reinforcement Learning with Natural Language Constraints” (from IDS) in further view of Murugesan et al, “Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines” (from IDS) and Chaudhury et al [US 2021/0390387 A1].
With regard to claim 5, Hoang in view of Yang teach all the claim limitations of claim 1 as discussed above.
Hoang in view of Yang do not appear to explicitly teach:
wherein obtaining the safety hints comprises: generating a Safety Concept Net Graph (SCNG) data structure using generic safety knowledge of entities and expected safety interactions in the environment; 
based on current state information of the environment, extract safety entities of interest using the SCNG; 
determine if a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG; 
when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generate a corresponding safety hint.
Murugesan teaches wherein obtaining the  interactions in the environment (see section 3.3; the system can utilize a conceptNet to determine hints or commonsense knowledge about various entities via their similarities to one another based on the current observation).  
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang by utilizing a ConceptNet that can determine similarities or contextual connections associated with observed entities as taught by Murugesan in order to allow agents to be able to capitalize on commonsense knowledge about entities and their relationships with each other so that the agent can make decisions that are based on greater understanding of all the observed entities in the environment.
Hoang in view of Yang in further view of Murugesan teach wherein obtaining the safety hints comprises: generating a Safety Concept Net Graph (SCNG) data structure using generic safety knowledge of entities and expected safety interactions in the environment (see Yang, section 5.1 starting on page 5 through second paragraph on page 6; and section 6.2;  see Hoang, see page 7 with respect to section 3.1 in Supplementary Materials & section 4.1 on page 3; see Murugesan, section 3.3 and section 4.1 Commonsense-based agents; the system can utilize a ConceptNet with respect to safety considerations for determining safety hints/warnings and be able to link entities contextually to determine expected interactions).
based on current state information of the environment, extract safety entities of interest using the SCNG (see Hoang, see page 7 with respect to section 3.1 in Supplementary Materials & section 4.1 on page 3; see Yang, section (1) on page 6; see Murugesan, section 3.3. on page 4; the system can utilize the observations to determine entities that are relevant or of interest at that particular observation and time).
Hoang in view of Yang in further view of Murugesan teach that other entities can be related to each other (see Murugesan section (3) on page 5) but do not appear to explicitly teach:
determine if a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG; 
when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generate a corresponding safety hint.
Chaudhury teaches determine if 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang in further view of Murugesan by being able to restrict what related concepts/entities should also be considered by the system/agent as taught by Chaudhury in order to be able to allow for neighborhood entities/concepts to be considered by the system while reducing the amount of information that the agent has to consider by not extracting or utilizing respective concepts/entities that are of low semantic/related/relevant value to the current state thus reducing overall computations/analysis that the respective system/agent has to evaluate.
Hoang in view of Yang in further view of Murugesan and Chaudhury teach determine if a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG; when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generate a corresponding safety hint (see Chaudhury, paragraph [0044]; see Murugesan section (3) on page 5; see Yang, sections (1) and (2) on page 6; the system can determine entities/concepts that are similar to observed entities/concepts at a particular time and be able to determine respective hints/warnings/constraints associated with those observed entities/concepts).

With regard to claim 7, Hoang in view of Yang in further view of Murugesan and Chaudhury teach updating a safety hint action list with all semantically close available actions in the current state of the environment (see Murugesan, section 3.3; see section 4.1 Commonsense-based agents; see Yang, sections (1) and (2) on page 6; the system can update the respective actions including their respective hints/costs/penalties for the actions based on the current state/observations).

With regard to claim 12, Hoang in view of Yang and Murugesan teach all the claim limitations of claims 8 and 11 as discussed above.
Hoang in view of Yang and Murugesan teach wherein: the safety concept net comprises a Safety Concept Net Graph (SCNG) (see Yang, section 5.1 starting on page 5 through second paragraph on page 6; and section 6.2;  see Hoang, see page 7 with respect to section 3.1 in Supplementary Materials & section 4.1 on page 3; see Murugesan, section 3.3 and section 4.1 Commonsense-based agents; the system can utilize a ConceptNet with respect to safety considerations for determining safety hints/warnings and be able to link entities contextually to determine expected interactions). 
Hoang in view of Yang and Murugesan do not appear to explicitly teach:
the semantic analyzer determines semantic closeness between an entity attribute and any node or edge of the SCNG to generate a safety hint.
Chaudhury teaches determines 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang and Murugesan by being able to restrict what related concepts/entities should also be considered by the system/agent as taught by Chaudhury in order to be able to allow for neighborhood entities/concepts to be considered by the system while reducing the amount of information that the agent has to consider by not extracting or utilizing respective concepts/entities that are of low semantic/related/relevant value to the current state thus reducing overall computations/analysis that the respective system/agent has to evaluate.
Hoang in view of Yang and Murugesan in further view of Chaudhury teach the semantic analyzer determines semantic closeness between an entity attribute and any node or edge of the SCNG to generate a safety hint (see Chaudhury, paragraph [0044]; see Murugesan section (3) on page 5; and section TextWorld Commonsense (TWC) on page 2; see Yang, sections (1) and (2) on page 6; the system can determine entities/concepts that are similar to observed entities/concepts at a particular time and be able to determine respective hints/warnings/constraints associated with those observed entities/concepts where the entities have attributes).

With regard to claim 14, Hoang in view of Yang and Murugesan in further view of Chaudhury teach wherein the semantic analyzer determines how semantically close the safety hint is to all current possible actions the RL agent may take to generate a safety hint action command (see Murugesan, section 3.3; see section 4.1 Commonsense-based agents; see Yang, sections (1) and (2) on page 6; the system can update the respective actions including their respective hints/costs/penalties for the actions based on the current state/observations).

With regard to claims 18 and 20, these claims are substantially similar to claims 12 and 14 and are rejected for similar reasons as discussed above.



Claims 6, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Hoang et al, “SCERL: A Benchmark for intersecting language and Safe Reinforcement Learning” (from IDS) in view of Yang et al, “Safe Reinforcement Learning with Natural Language Constraints” (from IDS) in further view of Murugesan et al, “Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines” (from IDS) and Chaudhury et al [US 2021/0390387 A1] and in further view of Tian et al [US 2023/0196023 A1].
With regard to claim 6, Hoang in view of Yang in further view of Murugesan and Chaudhury teach all the claim limitations of claims 1 and 5 as discussed above.
Hoang in view of Yang in further view of Murugesan and Chaudhury do not appear to explicitly teach:
wherein, when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generating a corresponding safety hint by: finding a lemma form of an antonym for the semantically close node or edge; and construct the corresponding safety hint based on the antonym.
Tian teaches finding a lemma form of an antonym (see paragraph [0046]; the system can utilize word analysis of various language forms such as being able to lemmatize noun forms of synonyms and antonyms).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang in further view of Murugesan and Chaudhury by being able to utilize language analysis techniques including lemmatizing noun forms of antonyms as taught by Tian in order to structure the various natural language texts into a standardized form including making sure that anonyms are in a more understandable form so that the system can accurately interpret and understand the respective natural language text without forcing the natural language understanding portion of the system (and/or agents) to be larger and trained to understand all possible natural language input when training understanding on the base forms of the words/phrases can achieve high-quality results while allowing the model/understanding module not be overly bloated.
Hoang in view of Yang in further view of Murugesan and Chaudhury and in further view of Tian teach wherein, when a fact attribute of an entity of interest is semantically close to any node or edge in the SCNG, generating a corresponding safety hint by: finding a lemma form of an antonym for the semantically close node or edge; and construct the corresponding safety hint based on the antonym (see Tian, paragraph [0046]; see Hoang, page 7; see Chaudhury, paragraph [0044]; see Murugesan section (3) on page 5; see Yang, sections (1) and (2) on page 6; the system has various constraints that can be utilized to determine if they are applicable to the current observations including by being able to change respective textual input into a lemma form).

With regard to claim 13, Hoang in view of Yang and Murugesan in further view of Chaudhury teach all the claim limitations of claims 8, 11, and 12 as discussed above.
Hoang in view of Yang and Murugesan in further view of Chaudhury teach when the closeness between the entity attribute and a node or edge of the SCNG is within a threshold (see Chaudhury, paragraph [0044]; see Murugesan section (3) on page 5; see Yang, sections (1) and (2) on page 6; the system can determine entities/concepts that are similar to observed entities/concepts at a particular time and be able to determine respective hints/warnings/constraints associated with those observed entities/concept). 
Hoang in view of Yang and Murugesan in further view of Chaudhury do not appear to explicitly teach:
the semantic analyzer determines a lemma form of an antonym for the entity attribute to generate the safety hint.
Tian teaches finding a lemma form of an antonym (see paragraph [0046]; the system can utilize word analysis of various language forms such as being able to lemmatize noun forms of synonyms and antonyms).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the knowledge-aware and commonsense agents as taught by Hoang in view of Yang and Murugesan in further view of Chaudhury by being able to utilize language analysis techniques including lemmatizing noun forms of antonyms as taught by Tian in order to structure the various natural language texts into a standardized form including making sure that anonyms are in a more understandable form so that the system can accurately interpret and understand the respective natural language text without forcing the natural language understanding portion of the system (and/or agents) to be larger and trained to understand all possible natural language input when training understanding on the base forms of the words/phrases can achieve high-quality results while allowing the model/understanding module not be overly bloated.
Hoang in view of Yang and Murugesan in further view of Chaudhury and in further view of Tian teach the semantic analyzer determines a lemma form of an antonym for the entity attribute to generate the safety hint (see Tian, paragraph [0046]; see Hoang, page 7; see Chaudhury, paragraph [0044]; see Murugesan section (3) on page 5; see Yang, sections (1) and (2) on page 6; the system has various constraints that can be utilized to determine if they are applicable to the current observations including by being able to change respective textual input into a lemma form).

With regard to claim 19, this claim is substantially similar to claim 13 and is rejected for similar reasons as discussed above.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Das et al [US 2021/0173395 A1] which teaches usage of safety specification and constraints to infer potential actions associated with images (see paragraph 19).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARC S SOMERS whose telephone number is (571)270-3567. The examiner can normally be reached M-F 11-8 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached at 5712729767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MARC S SOMERS/Primary Examiner, Art Unit 2159                                                                                                                                                                                                        3/5/2026
Read full office action
Prosecution Timeline

Jul 27, 2023
Application Filed
Mar 05, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/824,014
Patent 12579099
CONTROL LEVEL TAGGING METHOD AND SYSTEM
2y 5m to grant Granted Mar 17, 2026
17/813,218
Patent 12561288
METHOD AND APPARATUS TO VERIFY FILE METADATA IN A DEDUPLICATION FILESYSTEM
2y 5m to grant Granted Feb 24, 2026
18/172,315
Patent 12554681
SYSTEM AND METHOD OF UNDOING DATA BASED ON DATA FLOW MANAGEMENT
2y 5m to grant Granted Feb 17, 2026
15/062,791
Patent 12541502
METHODS AND APPARATUSES FOR IMPROVING PROCESSING EFFICIENCY IN A DISTRIBUTED SYSTEM
2y 5m to grant Granted Feb 03, 2026
18/587,613
Patent 12530365
SYSTEMS AND METHODS FOR A MACHINE LEARNING FRAMEWORK
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
65%
Grant Probability
99%
With Interview (+34.6%)
4y 0m
Median Time to Grant
Low
PTA Risk
Based on 563 resolved cases by this examiner. Grant probability derived from career allow rate.