Last updated: April 18, 2026
Application No. 17/220,019
SYSTEM AND METHOD FOR FACILITATING EXPLAINABILITY IN REINFORCEMENT MACHINE LEARNING

Final Rejection §101§103
Filed
Apr 01, 2021
Examiner
PRESSLY, KURT NICHOLAS
Art Unit
2125
Tech Center
2100 — Computer Architecture & Software
Assignee
Royal Bank Of Canada
OA Round
4 (Final)
Interview Optional

— +2.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 23 resolved cases, 2023–2026
Examiner Intelligence

PRESSLY, KURT NICHOLAS View full profile →
Grants only 26% of cases
Career Allow Rate
6 granted / 23 resolved
-28.9% vs TC avg
Minimal +2% lift
Without
With
+2.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
36.1%
-3.9% vs TC avg
§103
35.8%
-4.2% vs TC avg
§102
16.0%
-24.0% vs TC avg
§112
11.6%
-28.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 23 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 3-15, and 17-24 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding Claim 1,
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 1 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“instantiate a reinforcement learning agent that generates, via a function approximation representation, learned outputs governing its decision-making”
“for a given past input of the plurality of past inputs and a given group of plurality of groups of the state variables: generate data reflective of a perturbed input by altering a value of subset of the state variable in the given group in the given past input, said perturbed input corresponding to a default value based on correlations between said subset of state variables”
“generate a distance metric reflective of a magnitude of difference between the perturbed learned output and the past learned output”
“generate a graphical representation including the distance metric”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are mere instructions to apply an exception (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)).
The limitations:
“A computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, the system comprising: at least one processor; memory in communication with the at least one processor; software code stored in the memory, which when executed at the at least one processor causes the system to”
“each of the past learned outputs generated by the reinforcement learning agent when presented with a corresponding one of the past inputs”
“present the data reflective of the perturbed input to the reinforcement learning agent to obtain a perturbed learned output generated by the reinforcement learning agent”
As drafted, are additional elements that amount to no more than mere instructions to apply an exception for the abstract ideas. See MPEP 2106.05(f).
The limitations:
“store data records of a plurality of past inputs presented to the reinforcement learning agent, each of the past inputs including values of a plurality of state variables, and data records of a plurality of past learned outputs”
“receive …a group definition data structure defining a plurality of groups of the state variables, wherein the group definition data structure is generated by a state variable grouper subsystem based on one or more correlations between the state variables in the group definition data structure”
“store a group definition data structure defining a plurality of groups of the state variables, wherein the group definition data structure is generated by a state variable grouper subsystem based on one or more correlations between the state variables in the group definition data structure”
“display said graphical representation in a graphical user interface, said graphical user interface comprising a plurality of panels arranged based on a relevancy score determined by the at least one processor”
As drafted, are additional elements that amount to no more than insignificant extra-solution activity. See MPEP 2106.05(g).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply” and “insignificant extra-solution activity”. Specifically, the storing and displaying limitations recite the well-understood, routine, and conventional activity of storing and retrieving information in memory. MPEP 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015) (storing and retrieving information in memory). In addition, the receiving limitation recites the well-understood, routine, and conventional activity of receiving and transmitting data over a network. MPEP 2106.05(d)(II); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network). Mere instructions to apply an exception and insignificant extra-solution activity cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 3,
Claim 3 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 3 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“evaluate a condition associated with one or more of the groups of state variables”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are additional details that don’t apply the exception in a meaningful way (See MPEP 2106.05(e)).
The limitations:
“wherein the graphical representation is based in part on the evaluated condition”
As drafted, is an additional element that amounts to no more than additional details that don’t apply the exception in a meaningful way. See MPEP 2106.05(e).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are additional details that do not apply the exception in a meaningful way. Additional details that do not apply the exception in a meaningful way cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 4,
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 4 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“generate a human-understandable description of an importance of a given group based on the distance metric”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 5,
Claim 5 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 5 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“present a generated insight regarding a behaviour of the reinforcement learning agent”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 6,
Claim 6 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 6 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“repeat said generating of the distance metric for each of the plurality of past inputs”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 7,
Claim 7 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 7 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“repeat said generating of the distance metric for each of the groups of the state variables”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 8,
Claim 8 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 8 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“generate the group definition data structure upon calculating at least one correlation between the state variables”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 9,
Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 9 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“generate a metric reflective of a magnitude in change of aggressiveness of the reinforcement learning agent, upon processing the distance metric”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 10,
Claim 10 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 10 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the generating the distance metric includes calculating an alpha-divergence”
As drafted, under their broadest reasonable interpretations, cover mathematical
concepts, i.e., mathematical relationships, mathematical formulas or equations, and
mathematical calculations. The above limitations in the context of this claim encompass
mathematical calculations.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 11,
Claim 11 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 11 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: See corresponding analysis of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are mere instructions to apply an exception (See MPEP 2106.05(f)).
The limitations:
“wherein the function approximation representation includes at least one of a neural network, a tabular function approximation representation and a tile-coding function approximation representation”
As drafted, are additional elements that amount to no more than mere instructions to apply an exception for the abstract ideas. See MPEP 2106.05(f).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply”. Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 12,
Claim 12 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 12 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: See corresponding analysis of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are additional details that don’t apply the exception in a meaningful way (See MPEP 2106.05(e)).
The limitations:
“wherein said plurality of past learned outputs includes a plurality of policies”
As drafted, is an additional element that amounts to no more than additional details that don’t apply the exception in a meaningful way. See MPEP 2106.05(e).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are additional details that do not apply the exception in a meaningful way. Additional details that do not apply the exception in a meaningful way cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 13,
Claim 13 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 13 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: See corresponding analysis of claim 1.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are additional details that don’t apply the exception in a meaningful way (See MPEP 2106.05(e)).
The limitations:
“wherein said plurality of past learned outputs includes a plurality of value function outputs”
As drafted, is an additional element that amounts to no more than additional details that don’t apply the exception in a meaningful way. See MPEP 2106.05(e).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are additional details that do not apply the exception in a meaningful way. Additional details that do not apply the exception in a meaningful way cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 14,
Claim 14 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 14 is directed to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein said altering includes altering the value of the at least one state variable to a default value”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 1.
Step 2B Analysis: See corresponding analysis of claim 1.

Regarding Claim 15,
Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 15 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“instantiating a reinforcement learning agent that generates, via a function approximation representation, learned outputs governing its decision-making”
“for a given past input of the plurality of past inputs and a given group of plurality of groups of the state variables: generating data reflective of a perturbed input by altering a value of a subset of the state variable in the given group in the given past input, said perturbed input corresponding to a default value based on correlations between said subset of state variables in the given group”
“generating a distance metric reflective of a magnitude of difference between the perturbed learned output and the past learned output”
“generating a graphical representation including the distance metric”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are mere instructions to apply an exception (See MPEP 2106.05(f)) and insignificant extra-solution activity (See MPEP 2106.05(g)).
The limitations:
“A computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents”
“each of the past learned outputs generated by the reinforcement learning agent when presented with a corresponding one of the past inputs”
“presenting the data reflective of the perturbed input to the reinforcement learning agent to obtain a perturbed learned output generated by the reinforcement learning agent”
As drafted, are additional elements that amount to no more than mere instructions to apply an exception for the abstract ideas. See MPEP 2106.05(f).
The limitations:
“storing data records of a plurality of past inputs presented to the reinforcement learning agent, each of the past inputs including values of a plurality of state variables, and data records of a plurality of past learned outputs”
“receiving … a group definition data structure defining a plurality of groups of the state variables, wherein the group definition data structure is generated by a state variable grouper subsystem based on one or more correlations between the state variables in the group definition data structure”
“storing a group definition data structure defining a plurality of groups of the state variables, wherein the group definition data structure is generated by a state variable grouper subsystem based on one or more correlations between the state variables in the group definition data structure”
“displaying said graphical representation in a graphical user interface, said graphical user interface comprising a plurality of panels arranged based on a relevancy score determined by at least one processor.”
As drafted, are additional elements that amount to no more than insignificant extra-solution activity. See MPEP 2106.05(g).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply” and “insignificant extra-solution activity”. Specifically, the storing and displaying limitations recite the well-understood, routine, and conventional activity of storing and retrieving information in memory. MPEP 2106.05(d)(II); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015) (storing and retrieving information in memory). In addition, the receiving limitation recites the well-understood, routine, and conventional activity of receiving and transmitting data over a network. MPEP 2106.05(d)(II); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network). Mere instructions to apply an exception and insignificant extra-solution activity cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 17,
Claim 17 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 17 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“repeating the generating the distance metric for each of the plurality of past inputs”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 15.
Step 2B Analysis: See corresponding analysis of claim 15.

Regarding Claim 18,
Claim 18 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 18 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“repeating the generating the distance metric for each of the groups of the state variables”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 15.
Step 2B Analysis: See corresponding analysis of claim 15.

Regarding Claim 19,
Claim 19 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 19 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“generating the group definition data structure upon calculating at least one correlation between the state variables”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 15.
Step 2B Analysis: See corresponding analysis of claim 15.

Regarding Claim 20,
Claim 20 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 20 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“generating a metric reflective of a magnitude in change of aggressiveness of the reinforcement learning agent, upon processing the distance metric”
As drafted, under their broadest reasonable interpretations, cover mental processes, i.e., concepts performed in the human mind (including an observation, evaluation, judgement, opinion). The above limitations in the context of this claim correspond to mental processes, e.g., evaluation and judgement with assistance of pen and paper.
Step 2A Prong Two Analysis: See corresponding analysis of claim 15.
Step 2B Analysis: See corresponding analysis of claim 15.

Regarding Claim 21,
Claim 21 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 21 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The limitations:
“wherein the generating the distance metric includes calculating an alpha-divergence”
As drafted, under their broadest reasonable interpretations, cover mathematical
concepts, i.e., mathematical relationships, mathematical formulas or equations, and
mathematical calculations. The above limitations in the context of this claim encompass
mathematical calculations.
Step 2A Prong Two Analysis: See corresponding analysis of claim 15.
Step 2B Analysis: See corresponding analysis of claim 15.

Regarding Claim 22,
Claim 22 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 22 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: See corresponding analysis of claim 15.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are mere instructions to apply an exception (See MPEP 2106.05(f)).
The limitations:
“wherein the function approximation representation includes at least one of a neural network, a tabular function approximation representation and a tile-coding function approximation representation”
As drafted, are additional elements that amount to no more than mere instructions to apply an exception for the abstract ideas. See MPEP 2106.05(f).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are “mere instructions to apply”. Mere instructions to apply an exception cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 23,
Claim 23 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 23 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: See corresponding analysis of claim 15.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are additional details that don’t apply the exception in a meaningful way (See MPEP 2106.05(e)).
The limitations:
“wherein said plurality of past learned outputs includes a plurality of policies”
As drafted, is an additional element that amounts to no more than additional details that don’t apply the exception in a meaningful way. See MPEP 2106.05(e).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are additional details that do not apply the exception in a meaningful way. Additional details that do not apply the exception in a meaningful way cannot provide an inventive concept. The claim is not patent eligible.

Regarding Claim 24,
Claim 24 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1 Analysis: Claim 24 is directed to a computer-implemented method for facilitating explainability of decision-making by reinforcement learning agents, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: See corresponding analysis of claim 15.
Step 2A Prong Two Analysis: The judicial exceptions are not integrated into a practical application. In particular, the claim recited additional elements that are additional details that don’t apply the exception in a meaningful way (See MPEP 2106.05(e)).
The limitations:
“wherein said plurality of past learned outputs includes a plurality of value function outputs”
As drafted, is an additional element that amounts to no more than additional details that don’t apply the exception in a meaningful way. See MPEP 2106.05(e).
Therefore, the additional elements do not integrate the abstract ideas into a practical application.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, all of the additional elements are additional details that do not apply the exception in a meaningful way. Additional details that do not apply the exception in a meaningful way cannot provide an inventive concept. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3-9, 11-15, 17-20, and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Nagaraja (U.S. Patent No. 9,754,221) (“Nagaraja”) in view of Ismailsheriff et al. (U.S. Patent No. 10,873,533) (“Ismailsheriff”) in further view of Zhang et al. (Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations) (“Zhang”).

Regarding claim 1, Nagaraja teaches a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents, the system comprising: at least one processor (Nagaraja Col 3 lines 46-49 “In order to overcome the drawbacks discussed hitherto, the present disclosure envisages processor architecture specifically designed to implement reinforcement learning operations.” Nagaraja provides a computer-implemented system comprising a processor for implementing reinforcement learning operations corresponding to a computer-implemented system for facilitating explainability of decision-making by reinforcement learning agents comprising at least one processor.); memory in communication with the at least one processor (Nagaraja Col 3 lines 51-55 “The processor architecture includes a first processor (host processor), a first memory module (IRAM), a Complex Instruction fetch and decode (CISFD) unit, a second processor (Reinforcement learning processor), and a second memory module.” Nagaraja provides a memory in communication with a processor.); software code stored in the memory (Nagaraja Col 7 lines 39-43 “The first memory module stores the application-specific instruction set (ASI), which incorporates the SIMA instructions (referred to as ‘instructions’ hereafter) for performing predetermined reinforcement learning tasks.” Nagaraja provides instruction sets stored in memory corresponding to software code stored in memory.), which when executed at the at least one processor causes the system to: instantiate a reinforcement learning agent that generates, via a function approximation representation, learned outputs governing its decision-making (Nagaraja Col 2 lines 26-28 “Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.”; Col 3 lines 13-16 “Yet another object of the present disclosure is to provide an application domain specific instruction set capable of performing value function approximation and reward function approximation, by the way of training a neural network”; Col 5 lines 2-6 “Further, the neural network also programs the reinforcement learning agent with a specific reward function, which dictates the actions to be performed by the reinforcement learning agent to obtain the maximum possible reward.”; Col 6 lines 61-67, Col 7 lines 1-2 “Further, the SIMA type instructions also enable the reinforcement learning agent to iteratively exploit the learnings deduced from the previous interactions, in any of the subsequent interactions with the reinforcement learning environment. Further, the SIMA type instructions also provide for construction of a Markov Decision Process (MDP) and a Semi-Markov Decision Process (SMDP) based on the interaction between the reinforcement learning agent and the corresponding reinforcement learning environment.” Nagaraja provides instantiating a reinforcement learning agent which uses function approximation to govern decision making.); store data records of a plurality of past inputs presented to the reinforcement learning agent (Nagaraja Col 4 lines 45-51 “In accordance with the present disclosure, the second memory module is partitioned into an ‘a-memory module’, a ‘v-memory module’, a ‘q-memory module’, and an ‘r-memory module’. The ‘a-memory module’ stores information corresponding to the action(s) performed by the reinforcement learning agent at every state, during an interaction with the reinforcement learning environment.” Nagaraja provides storing information corresponding to actions performed by the reinforcement learning agent at every state, which corresponds to store data records of a plurality of past inputs presented to the reinforcement learning agent.), each of the past inputs including values of a plurality of state variables (Nagaraja Col 4 lines 51-54 “Further, the ‘v-memory module’ stores the ‘state-value functions’ which represent the value associated with the reinforcement learning agent at every state thereof.” Nagaraja provides state-value-functions, which contain state variables and thus corresponds to each of the past inputs including values of a plurality of state variables.), and data records of a plurality of past learned outputs, each of the past learned outputs generated by the reinforcement learning agent when presented with a corresponding one of the past inputs (Nagaraja Col 4 lines 54-58 “Further, the ‘q-memory module’ stores ‘Q-values’ which are generated using a state-action function indicative of the action(s) performed by the reinforcement learning agent at every corresponding state.” Nagaraja provides storing generated values by the reinforcement learning agent which corresponds to data records of a plurality of past learned outputs when presented with a past input.); receive and store a group definition data structure defining a plurality of groups of the state variables (Nagaraja Col 9 lines 45-55 “Subsequently, the reinforcement learning processor 14 selectively retrieves the ‘state-value functions’, ‘actions’, ‘Q-values’ and ‘rewards’ corresponding to the reinforcement learning agent (and indicative of the interaction between the reinforcement learning agent and the reinforcement learning environment) from the ‘a-memory module’ 16A, ‘v-memory module’ 16B, ‘q-memory module’ 16C, and ‘r-memory module’ 16D respectively, and transmits the retrieved ‘state-value functions’, ‘actions’, ‘Q-values’ and ‘rewards’ to a neural network (illustrated in FIG. 7A) via a corresponding neural network data path 18”; Col 4 lines 45-51 “In accordance with the present disclosure, the second memory module is partitioned into an ‘a-memory module’, a ‘v-memory module’, a ‘q-memory module’, and an ‘r-memory module’. The ‘a-memory module’ stores information corresponding to the action(s) performed by the reinforcement learning agent at every state, during an interaction with the reinforcement learning environment.” Nagaraja provides using memory to receive information about state variables/state value functions, and a memory which corresponds to receive and store a group definition data structure defining a plurality of groups of the state variables, wherein each state value function corresponds to a group definition data structure defining a plurality of groups of the state variables.), wherein the group definition data structure is generated by a state variable grouper subsystem based on one or more correlations between the state variables in the group definition data structure (Nagaraja Col 8 lines 29-34 “In accordance with the present disclosure, the ‘v-thread’ upon execution determines the ‘state-value functions’ corresponding to each state of the reinforcement learning agent. The ‘state-value functions’ indicate the ‘value’ associated with each of the states of the reinforcement learning agent.” Nagaraja provides a v-thread corresponding to a state variable grouper subsystem which determines the state-value-functions, which are group definition data structures containing correlations between state variables, corresponding to the group definition data structure is generated by a state variable grouper subsystem based on one or more correlations between the state variables in the group definition data structure.); 
Nagaraja fails to teach and for a given past input of the plurality of past inputs and a given group of plurality of groups of the state variables: generate data reflective of a perturbed input by altering a value of a subset of the state variable in the given group in the given past input, said perturbed input corresponding to a default value based on correlations between said subset of state variables; present the data reflective of the perturbed input to the reinforcement learning agent to obtain a perturbed learned output generated by the reinforcement learning agent; and generate a distance metric reflective of a magnitude of difference between the perturbed learned output and the past learned output, generate a graphical representation including the distance metric; display said graphical representation in a graphical user interface, said graphical user interface comprising a plurality of panels arranged based on a relevancy score determined by the at least one processor.
	However, Ismailsheriff teaches …present the data reflective of the perturbed input to the reinforcement learning agent to obtain a perturbed learned output generated by the reinforcement learning agent (Ismailsheriff Col 27 lines 52-58 “As a result of the action a.sub.t, the environment changes its state from s.sub.t to some s.sub.t+1∈S according to the state transition probabilities given by P: the probability of ending up in state s.sub.t+1 given that action a.sub.t is performed at state action s.sub.t is P(s.sub.t, a.sub.t, s.sub.t+1). The agent receives a scalar reward r.sub.t+1∈custom character, according to the reward function R: r.sub.t+1=R(s.sub.t, a.sub.t, s.sub.t+1).” Ismailsheriff provides using the altered state to determine a new state with the agent receiving scalar rewards according to the reward function of said state, which corresponds to present the data reflective of the perturbed input to the reinforcement learning agent to obtain a perturbed learned output generated by the reinforcement learning agent.); and generate a distance metric reflective of a magnitude of difference between the perturbed learned output and the past learned output (Ismailsheriff Col 25 lines 25-28 “Other similarity (or distance) measures that can also be used include the cosine similarity, Jaccard coefficient, the Pearson correlation coefficient, and the averaged Kullback-Leibler divergence, among others.” Ismailsheriff provides generating distance metrics for the disclosed methods including the magnitude of difference between the perturbed learned output and the past learned output.), generate a graphical representation including the distance metric (Ismailsheriff Col 25 lines 25-38 “Other similarity (or distance) measures that can also be used include the cosine similarity, Jaccard coefficient, the Pearson correlation coefficient, and the averaged Kullback-Leibler divergence, among others. Some embodiments may also use various indexing structures or techniques for efficiently searching the feature set space, including multi-dimensional hashing, which can map features into fix-sized bins or buckets based on some function applied to each feature; locality sensitive hashing, which can use unions of independently computed hashing functions to index features; or multi-dimensional search trees, such as k-d trees, which can divide the multi-dimensional feature space along alternating axis-aligned hyper-planes to maximize search tree balance; among other approaches.” Ismailsheriff provides distance metrics to be implemented in tree data structures, corresponding to generate a graphical representation including the distance metric.); display said graphical representation in a graphical user interface, said graphical user interface comprising a plurality of panels arranged based on a relevancy score determined by the at least one processor (Ismailsheriff Col 54 lines 51-63 “The processor 855 can communicate with a chipset 860 that can control input to and output from the processor 855. In this example, the chipset 860 can output information to an output device 865, such as a display, and can read and write information to storage device 870, which can include magnetic media, solid state media, and other suitable storage media”; Col 25 lines 25-38 “Other similarity (or distance) measures that can also be used include the cosine similarity, Jaccard coefficient, the Pearson correlation coefficient, and the averaged Kullback-Leibler divergence, among others. Some embodiments may also use various indexing structures or techniques for efficiently searching the feature set space, including multi-dimensional hashing, which can map features into fix-sized bins or buckets based on some function applied to each feature; locality sensitive hashing, which can use unions of independently computed hashing functions to index features; or multi-dimensional search trees, such as k-d trees, which can divide the multi-dimensional feature space along alternating axis-aligned hyper-planes to maximize search tree balance; among other approaches.” Ismailsheriff provides outputting information from the disclosed embodiments to a display, which includes the calculated distance metrics and corresponding graphical representations, which may be indexed for efficient searching, corresponding to display said graphical representation in a graphical user interface, said graphical user interface comprising a plurality of panels arranged based on a relevancy score determined by the at least one processor).
Nagaraja and Ismailsheriff are both considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja with the above teachings of Ismailsheriff. Doing so would allow for a more efficient and more robust approach to noise (Ismailsheriff Col 24 lines 66-67, Col 25 lines 1-3 “Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).
Further, Zhang teaches for a given past input of the plurality of past inputs and a given group of plurality of groups of the state variables: generate data reflective of a perturbed input by altering values of a subset of the state variables in the given group in the given past input (Zhang 5.1 Evaluation of robustly trained DQN (SA-DQN) “For Acrobot, since there does not exist a limit on the state value and each state feature has different range, we first run a well-trained natural agent and collect the standard deviation (std) of each state feature value over 100 episodes. Then perturbation range on each state feature is determined individually, depending on the standard deviation of that feature. We choose E = 0.2std as our budget. The same value is used for both training and attack at test time”; Model performance under state adversarial attacks “We consider a norm-like bounded perturbation, where each state variable is perturbed individually within a predefined ± range. Since each state variable can have greatly different range(e.g. the range of position and velocity variables can be quite different),we rescale by the standard deviations of each state variable.” Zhang provides perturbing state variables including with collected data and a standard deviation, wherein collected data corresponds to past input, wherein perturbing the state variables corresponds to altering values of the state variables.), said perturbed input corresponding to a default value based on correlations between said subset of state variables (Zhang 5.1 Evaluation of robustly trained DQN (SA-DQN) “For Acrobot, since there does not exist a limit on the state value and each state feature has different range, we first run a well-trained natural agent and collect the standard deviation (std) of each state feature value over 100 episodes. Then perturbation range on each state feature is determined individually, depending on the standard deviation of that feature. We choose E = 0.2std as our budget. The same value is used for both training and attack at test time”; Model performance under state adversarial attacks “We consider a norm-like bounded perturbation, where each state variable is perturbed individually within a predefined ± range. Since each state variable can have greatly different range(e.g. the range of position and velocity variables can be quite different),we rescale by the standard deviations of each state variable.” Zhang provides perturbing state variables, including with a standard deviation of the state variables, wherein the standard deviation of the state variables corresponds to correlations between state variables, which also includes a default value of the standard deviation of the state variables (i.e., E = 0.2std).).
Nagaraja, Ismailsheriff, and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff with the above teachings of Zhang. Doing so would improve the robustness of reinforcement learning agents under attacks on observations (Zhang Abstract “We demonstrate that our proposed training procedure significantly improves the robustness of DQN and DDPG agents under a suite of strong white box attacks on observations, including a few novel attacks we specifically craft.”)

Regarding claim 3, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, wherein the software code, when executed at the at least one processor, further causes the system to: evaluate a condition associated with one or more of the groups of state variables (Ismailsheriff Col 28 lines 54-62 “As discussed above, a training data set for the traffic class-specific congestion signatures 426 can comprise a collection of flows labeled as corresponding to a predetermined traffic class and predetermined congestion state (and, in some cases, labeled as not corresponding to the predetermined traffic class and predetermined congestion state) along with the minimum, maximum, and CV of RTTs sampled during RTT sampling periods for the flows.” Ismailsheriff provides evaluating a predetermined congestion state corresponding to evaluate a condition associated with one or more of the groups of state variables.); and wherein the graphical representation is based in part on the evaluated condition (Ismailsheriff Col 28 lines 62-67, Col 29 line 1, “The machine learning model generator 412 can provide the training data set as input to a classification algorithm (e.g., Naïve Bayes classifiers, logistic regression, K-NN, SVM, decision tree, random forest, boosting, neural network, etc.) to identify a function or a mapping based on the minimum, maximum, and CV of RTTs to the predetermined congestion state” Ismailsheriff provides implementing classification algorithms including tree-based algorithms for the predetermined congestion state corresponding to and wherein the graphical representation is based in part on the evaluated condition.).
Nagaraja, Ismailsheriff and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Ismailsheriff. Doing so would allow one to measure similarity between different datasets (Ismailsheriff Col 24 lines 64-67, Col 25 lines 1-3 “For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).

Regarding claim 4, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, wherein the software code, when executed at the at least one processor, further causes the system to generate a human-understandable description of an importance of a given group based on the distance metric (Ismailsheriff Col 24 lines 57-67, Col 25 lines 1-3 “In a nearest neighbor classification or regression, the top K nearest neighbors to an unlabeled data point can be identified from the training data. The class label or continuous value with the largest presence among the K nearest neighbors can be designated as the class label or continuous value for the unlabeled data point. In some embodiments, training data points may be aggregated for improving classification. For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.” Ismailsheriff provides generating a distance metric to determine similarity and clustering between different sets of data, which corresponds to generate a human-understandable description of an importance of a given group based on the distance metric since measuring similarity of one group to another is a human-understandable description of a relative importance for a particular classification.).
Nagaraja, Ismailsheriff and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Ismailsheriff. Doing so would allow one to measure similarity between different datasets (Ismailsheriff Col 24 lines 64-67, Col 25 lines 1-3 “For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).

Regarding claim 5, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1, as discussed above in the rejection of claim 1, wherein the software code, when executed at the at least one processor, further causes the system to present a generated insight regarding a behaviour of the reinforcement learning agent (Nagaraja Col 11 lines 45-49 “The term ‘inferencing context’ represents the manner in which the reinforcement learning agent behaves (i.e., performs actions) subsequent to learning from the interaction with the reinforcement learning environment.” Nagaraja provides inferencing context for a reinforcement learning agent corresponding to present a generated insight regarding a behaviour of the reinforcement learning agent.).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to combine the teachings of Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons as disclosed in claim 1 above.

Regarding claim 6, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, wherein the software code, when executed at the at least one processor, further causes the system to: repeat said generating of the distance metric for each of the plurality of past inputs (Ismailsheriff Col 26 lines 7-12 “At each node, a number M of the features can be selected at random from the set of all features. The feature that provides the best split can be used to do a binary split on that node. At the next node, another number M of the features can be selected at random and the process can be repeated.” Ismailsheriff provides repeating node selections for the tree algorithm, which include generating the distance metric and corresponds to repeat said generating of the distance metric for each of the plurality of past inputs.)
Nagaraja, Ismailsheriff and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Ismailsheriff. Doing so would allow one to measure similarity between different datasets (Ismailsheriff Col 24 lines 64-67, Col 25 lines 1-3 “For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).

Regarding claim 7, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, wherein the software code, when executed at the at least one processor, further causes the system to: repeat said generating of the distance metric for each of the groups of the state variables (Ismailsheriff Col 26 lines 7-12 “At each node, a number M of the features can be selected at random from the set of all features. The feature that provides the best split can be used to do a binary split on that node. At the next node, another number M of the features can be selected at random and the process can be repeated.” Ismailsheriff provides repeating node selections for the tree algorithm, which include generating the distance metric and corresponds to repeat said generating of the distance metric for each of the groups of the state variables.).
Nagaraja, Ismailsheriff and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Ismailsheriff. Doing so would allow one to measure similarity between different datasets (Ismailsheriff Col 24 lines 64-67, Col 25 lines 1-3 “For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).

Regarding claim 8, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, wherein the software code, when executed at the at least one processor, further causes the system to: generate the group definition data structure upon calculating at least one correlation between the state variables (Nagaraja Col 9 lines 31-36 “Further, the ‘q-memory module’ 16C stores ‘Q-values’ which are generated using a state-action function representative of a correlation between the actions performed by the reinforcement learning agent at every state and under a predetermined policy.” Nagaraja provides calculating a correlation between actions by the reinforcement learning agent using a state-action function, which corresponds to calculating at least one correlation between the state variables whereupon a group definition data structure is generated.). 
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to combine the teachings of Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons as disclosed in claim 1 above.

Regarding claim 9, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, where Nagaraja teaches wherein the software code, when executed at the at least one processor, further causes the system to: generate a metric reflective of a magnitude in change of aggressiveness of the reinforcement learning agent upon processing the distance metric (Ismailsheriff Col 27 lines 60-61 “The behavior of the agent can be described by its policy π, which is typically a stochastic function” Ismailsheriff provides describing the behavior of a reinforcement agent according to a policy which occurs subsequently to computing a distance metric, which corresponds to generate a metric reflective of a magnitude in change of aggressiveness of the reinforcement learning agent upon processing the distance metric.).
Nagaraja, Ismailsheriff and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Ismailsheriff. Doing so would allow one to measure similarity between different datasets (Ismailsheriff Col 24 lines 64-67, Col 25 lines 1-3 “For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).

Regarding claim 11, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1, wherein the function approximation representation includes at least one of a neural network, a tabular function approximation representation and a tile-coding function approximation representation (Nagaraja Col 3 lines 13-16 “Yet another object of the present disclosure is to provide an application domain specific instruction set capable of performing value function approximation and reward function approximation, by the way of training a neural network” Nagaraja provides the function approximation representation including a neural network.)
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to combine the teachings of Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons as disclosed in claim 1 above.

Regarding claim 12, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, wherein said plurality of past learned outputs includes a plurality of policies (Nagaraja Col 9 lines 28-36 “The ‘v-memory module’ 16B also stores the ‘optimal state-value functions’ indicative of an optimal state-value associated with the reinforcement learning agent under an optimal policy. Further, the ‘q-memory module’ 16C stores ‘Q-values’ which are generated using a state-action function representative of a correlation between the actions performed by the reinforcement learning agent at every state and under a predetermined policy.” Nagaraja provides an optimal policy and predetermined policy for generated values of the reinforcement learning agent, which corresponds to wherein said plurality of past learned outputs includes a plurality of policies.).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to combine the teachings of Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons as disclosed in claim 1 above.

Regarding claim 13, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1, wherein said plurality of past learned outputs includes a plurality of value function outputs (Nagaraja Col 9 lines 24-28 “The ‘v-memory module’ 16B stores the ‘state-value functions’ indicative of the value associated with every state of the reinforcement learning agent (identified by the reinforcement learning agent ID) while the reinforcement learning agent follows a predetermined policy.” Nagaraja provides value functions for values corresponding to outputs as past learned outputs.).
It would have been obvious to one of ordinary skill in the art before the filing date of the claimed invention to combine the teachings of Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons as disclosed in claim 1 above.

Regarding claim 14, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1, wherein said altering includes altering the value of the at least one state variable to a default value (Ismailsheriff Col 27 lines 44-52 “Some reinforcement learning agents can be modeled as Markov Decision Processes (MDPs). The MDP is a discrete time stochastic control process that can be defined by a tuple {S, A, P, R}, where S is a set of possible states of the environment, A is a set of possible actions, P:S×A×S.fwdarw.[0,1] is the state transition probability function, and R: S×A×S.fwdarw.custom character is the reward function. At each time step t, s.sub.t∈S describes the state of the environment. The agent can alter the state at each time step by taking actions a.sub.t∈A.” Ismailsheriff provides altering state values according to a set of possible states of the environment, which corresponds to altering the value of the at least one state variable to a default value.).
Nagaraja, Ismailsheriff and Zhang are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Ismailsheriff. Doing so would allow one to measure similarity between different datasets (Ismailsheriff Col 24 lines 64-67, Col 25 lines 1-3 “For example, small clusters can be determined from the instances of each class, and the centroid of each cluster may be used as a new instance. Such an approach may be more efficient and more robust to noise. Other variations may use different similarity (or distance) functions, such as the Minkowski distance or the Mahalanobis distance.”).

Regarding claim 15, it is the method embodiment of claim 1 with similar limitations to claim 1 and is rejected using the same reasoning found above in the rejection of claim 1.

	Regarding claim 17, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 6.

Regarding claim 18, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 7.

Regarding claim 19, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 8.

Regarding claim 20, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 9.

Regarding claim 22, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 11.

Regarding claim 23, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 12.

Regarding claim 24, the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang for the same reasons disclosed above in the rejection of claim 13.

Claims 10 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Nagaraja (U.S. Patent No. 9,754,221) (“Nagaraja”) in view of Ismailsheriff et al. (U.S. Patent No. 10,873,533) (“Ismailsheriff”) in further view of Zhang et al. (Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations) (“Zhang”) and Honkala et al. (U.S. Patent No. 10,891,524) (“Honkala”).

Regarding claim 10, Nagaraja in view of Ismailsheriff in further view of Zhang teaches the computer-implemented system of claim 1 as discussed above in the rejection of claim 1, but fails to teach wherein the generating the distance metric includes calculating an alpha-divergence.
However, Honkala teaches wherein the generating the distance metric (Honkala Col 10 lines 1-4 “It is appreciated that any number of distribution moments can be used and any other distance metric between distributions can be used instead of KL divergence over Gaussian distributions.” Honkala provides generating a distance metric.) includes calculating an alpha-divergence (Honkala Col 10 lines 16-19 “Approximations of the Earth Mover distance; Other statistical test and metrics, such as Cramér-von Mises, Kuiper, Shapiro-Wilk test, Anderson-Darling test, Rényi's divergence” Honkala provides calculating a Rényi's divergence for the distance metric, which corresponds to calculating an alpha-divergence.).
Nagaraja, Ismailsheriff, Zhang and Honkala are all considered to be analogous to the claimed invention because they are in the same field of artificial intelligence and more specifically reinforcement learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagaraja in view of Ismailsheriff in further view of Zhang with the above teachings of Honkala. Doing so would allow one to compute and use a distance calculation between activations in different feature maps and/or spatial and temporal locations (Honkala Col 10 lines 4-7 “In addition, it is possible to compute, and use a distance, covariances and cross-correlations between activations in different feature maps and/or spatial and temporal locations.”).

Regarding claim 21 the rejection of claim 15 is incorporated herein. Further, the limitations in this claim are taught by Nagaraja in view of Ismailsheriff in further view of Zhang and Honkala for the same reasons disclosed above in the rejection of claim 10.

Response to Arguments
Regarding the rejection applied under 35 U.S.C. 101, Applicant firstly asserts that the claims relate to improving the explainability of reinforcement learning agents, which Applicant asserts is not a mental process or abstract idea and is instead an improvement to the technological field of reinforcement learning (“Remarks”, Page 6). Applicant further asserts similarities to the Desjardins, and that the claims are directed to improvements to the explainability of reinforcement learning agents and therefore integrated into an abstract idea (“Remarks”, Page 6).
However, as discussed above in the 35 U.S.C. 101 rejection of claim 1 above, the claims contain at least the abstract idea of generating a distance metric reflective of a difference between perturbed and past data and generating a graphical representation including the distance metric. Therefore even if the claims did recite an improvement, as written, it would still be in the abstract idea of generating a graphical representation including the distance metric. Regarding an alleged improvement achieved through enhanced explainability, the explainability is achieved through the generation of the graphical representation which includes calculated metrics, resulting in the improved explainability. Therefore, even if the claims did recite an improvement, as written, it would be in the abstract idea. The MPEP notes that it is important to keep in mind that an improvement in the abstract idea itself is not an improvement in technology. MPEP 2106.05(a)(II). Further, Desjardins provided a specific training strategy that allows the model to preserve performance on earlier tasks even as it learns new ones. The current claims recite generating perturbed data for reinforcement learning, and as discussed above, any alleged improvement is in the abstract idea.
Regarding the rejection applied under 35 U.S.C. 103, Applicant’s arguments with respect to claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KURT NICHOLAS PRESSLY whose telephone number is (703)756-4639. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KURT NICHOLAS PRESSLY/Examiner, Art Unit 2125                                                                                                                                                                                                        

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
Prosecution Timeline

Apr 01, 2021
Application Filed
Apr 18, 2024
Non-Final Rejection — §101, §103
Sep 24, 2024
Response Filed
Oct 04, 2024
Final Rejection — §101, §103
Mar 10, 2025
Request for Continued Examination
Mar 19, 2025
Response after Non-Final Action
Aug 04, 2025
Non-Final Rejection — §101, §103
Dec 08, 2025
Response Filed
Mar 31, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/071,063
Patent 12585913
METHOD AND APPARATUS WITH NEURAL NETWORK CONVOLUTION OPERATION
2y 5m to grant Granted Mar 24, 2026
17/346,147
Patent 12580045
Smart qPCR
2y 5m to grant Granted Mar 17, 2026
17/369,678
Patent 12571938
MACHINE LEARNING WORKFLOW FOR PREDICTING HYDRAULIC FRACTURE INITIATION
2y 5m to grant Granted Mar 10, 2026
17/345,352
Patent 12530575
INTELLIGENT AND ADAPTIVE COMPLEX EVENT PROCESSOR FOR A CLOUD-BASED PLATFORM
2y 5m to grant Granted Jan 20, 2026
17/407,219
Patent 12499388
METHOD AND SYSTEM FOR MULTI-SENSOR FUSION USING TRANSFORM LEARNING
2y 5m to grant Granted Dec 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
26%
Grant Probability
28%
With Interview (+2.3%)
4y 8m
Median Time to Grant
High
PTA Risk
Based on 23 resolved cases by this examiner. Grant probability derived from career allow rate.