Last updated: May 29, 2026
Application No. 17/687,979
REINFORCEMENT LEARNING AGENT TO EVALUATE MONITORING SYSTEM STRENGTH

Non-Final OA §103§OTHER
Filed
Mar 07, 2022
Examiner
BARRETT, RYAN S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Oracle International Corporation
OA Round
3 (Non-Final)
Interview Optional

— +43.2% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 65% grant rate with +43.2% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 413 resolved cases, 2023–2026
Examiner Intelligence

BARRETT, RYAN S View full profile →
Grants 65% of resolved cases
Career Allowance Rate
267 granted / 413 resolved
+9.6% vs TC avg
Strong +43% interview lift
Without
With
+43.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
23 currently pending
Career history
437
Total Applications
across all art units
Statute-Specific Performance

§101
0.3%
-39.7% vs TC avg
§103
38.7%
-1.3% vs TC avg
§102
1.1%
-38.9% vs TC avg
§112
0.5%
-39.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 413 resolved cases
Office Action

§103 §OTHER
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Request for Continued Examination filed on 3/23/2026.  Claims 1-20 are pending in the case.  Claims 1, 8, and 15 are independent claims.

Response to Arguments
Applicant’s prior art arguments have been fully considered but they are not persuasive.
Applicant argues that “the Office does not identify any specific ‘known software development methods’ … Rather, the rejection relies on a generalized assertion that the elements could be combined using routine programming techniques” (page 14).  Examiner agrees.  A person of ordinary skill in the art is a software developer able to use any routine programming technique(s) to implement what they can envision.  Such a person, having the cited references before them, would find it obvious that a single program may incorporate features of the multiple references.  Once the combination is envisioned, implementation is routine and not inventive.  See, e.g., Santiago (“With LEGOs, I threw aside the instructions, and started to build anything and everything that I could cook up in my mind.  With coding, it was much the same.  I’d think of an idea, and start plucking away on my laptop until I had something working,” page 3 paragraph 2 lines 1-4).  Similarly, “[a] person of ordinary skill in the art is also a person of ordinary creativity, not an automaton.”  KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 U.S.P.Q.2d 1385 (2007).
Applicant argues that “it is ‘never appropriate to rely solely on 'common knowledge' in the art without evidentiary support’” (page 14).  Examiner agrees.  The cited references are the evidentiary support.  “Known software development methods” are, by definition, possessed by a person of ordinary skill in the art of software development.  MPEP § 2141.03(I).
Applicant argues that “the references are not merely different in application, but are technically incompatible in purpose and implementation, undermining any assertion that their combination would have been obvious” (page 15).  Examiner respectfully disagrees.  “The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference ... Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art.”  In re Keller, 642 F.2d 413, 425, 208 U.S.P.Q. 871, 881 (C.C.P.A. 1981).  See also In re Sneed, 710 F.2d 1544, 1550, 218 U.S.P.Q. 385, 389 (Fed. Cir. 1983) (“[I]t is not necessary that the inventions of the references be physically combinable to render obvious the invention under review.”); and In re Nievelt, 482 F.2d 965, 179 U.S.P.Q. 224, 226 (C.C.P.A. 1973) (“Combining the teachings of references does not involve an ability to combine their specific structures.”).  The general concept of “automatically determining and deploying threshold value sets that increase scenario strength” may be used as an inspiration to modify Anderson without reference to the details of Shen’s implementation.
Therefore, Examiner respectfully asserts that the cited art sufficiently teaches the limitations recited in the previous claims.

Applicant’s remaining prior art arguments have been considered but are moot because the new grounds of rejection presented below do not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the arguments.

Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.

Claims 1-2, 4-5, 7-9, 11-12, 15-16, and 18-19 are rejected under 35 U.S.C. § 103 as being unpatentable over Anderson et al. (“Evading Machine Learning Malware Detection,” 2017, https://api.semanticscholar.org/CorpusID:26406954, hereinafter Anderson) in view of Shen et al. (“Deep Q-Network-based Adaptive Alert Threshold Selection Policy for Payment Fraud Systems in Retail Banking,” 21 October 2020, https://arxiv.org/abs/2010.11062, hereinafter Shen) and Lorenz et al. (US 2022/0138754 A1, hereinafter Lorenz).

As to independent claim 1, Anderson teaches a computer-implemented method, comprising:
configuring an environment to simulate a monitored system for a reinforcement learning agent (“We investigate a more general framework for attacking static PE anti-malware engines based on reinforcement learning, which models more realistic attacker conditions, and subsequently has provides much more modest evasion rates. A reinforcement learning (RL) agent is equipped with a set of functionality-preserving operations that it may perform on the PE file. It learns through a series of games played against the anti-malware engine which sequence of operations is most likely to result in evasion for a given malware sample,” page 1 section “ABSTRACT” lines 16-25);
training the reinforcement learning agent over one or more training episodes to learn a policy that evades scenarios of the simulated monitored system while completing a task, wherein the scenarios are rules for detecting activity that is suspicious (“We investigate a more general framework for attacking static PE anti-malware engines based on reinforcement learning, which models more realistic attacker conditions, and subsequently has provides much more modest evasion rates. A reinforcement learning (RL) agent is equipped with a set of functionality-preserving operations that it may perform on the PE file. It learns through a series of games played against the anti-malware engine which sequence of operations is most likely to result in evasion for a given malware sample,” page 1 section “ABSTRACT” lines 16-25);
recording an episode of steps taken by the reinforcement learning agent, result states (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3, emphasis added), and triggered alerts (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3, emphasis added);
determining strength of monitoring of the simulated monitored system based on the recorded episode, wherein counts of triggered alerts, training time, or number of steps in a training episode serve as proxy metrics for strength of the alerting system or effectiveness of individual scenarios (“For the attack with continuous score, an immediate reward is given by initialscore - reportedscore, and provide a reward of 10.0 if the agent successfully bypasses the model. Note that this can result in negative rewards should the mutations actually increase the original score,” page 5 section “4. EXPERIMENTS” paragraph 4 lines 1-5 – the reward of 10.0 corresponds to a count of 0 triggered alerts); and
automatically modifying the scenarios in the monitored system in response to the determined strength (“evasive variants generated by the agent may be used to harden machine learning anti-malware engine via adversarial training,” page 1 section “ABSTRACT” lines 28-30).
Anderson does not appear to expressly teach a method comprising automatically determining and deploying threshold value sets that increase scenario strength.
Shen teaches a method comprising automatically determining and deploying threshold value sets that increase scenario strength (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” page 3 column left lines 3-6; “we use 5 descriptive attributes to construct the state set for each hour … 4) CC - Utilized processing capacity, which measures the processing capacity constraints for the system indirectly. As the system will drop all alerts after hitting the maximum capacity, the proposed algorithm can leverage this information to better balance the hourly thresholds (5) T - Score Threshold,” page 4 column left lines 2-11).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the automatic modifying of Anderson to comprise the automatic determining and deploying threshold value sets of Shen.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely automatically determining and deploying threshold value sets (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).
Anderson/Shen does not appear to expressly teach a method comprising [determining and deploying threshold value sets] based on cumulative alerts measured over one or more reinforcement learning training episodes and adjusting the threshold value sets to maintain the cumulative alerts within a pre-determined range.
Lorenz teaches a method comprising [determining and deploying threshold value sets] based on cumulative alerts measured over one or more reinforcement learning training episodes and adjusting the threshold value sets to maintain the cumulative alerts within a pre-determined range (“Tuning is a process done by the compliance team or delegates to evaluate which thresholds provide the most productivity for alert generation. Typically, this requires a minimum of three iterations, which include tweaking thresholds set above and below the standard baseline and testing to compare the results between current thresholds and new ones. If a rule is found to produce a high number of false positive alerts, consideration must be made to adjust thresholds. Conversely, if a rule is found not to yield any meaningful alerts, thresholds are reconsidered for modification or, the rule is either replaced or retired. In some embodiments, tuning may be performed at least in part by SAD computing device 102 itself,” paragraph 0054 lines 1-16).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the automatic determining and deploying threshold value sets of Anderson/Shen to comprise the pre-determined range of Lorenz.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely maintaining alerts within a pre-determined range (“Tuning is a process done by the compliance team or delegates to evaluate which thresholds provide the most productivity for alert generation. Typically, this requires a minimum of three iterations, which include tweaking thresholds set above and below the standard baseline and testing to compare the results between current thresholds and new ones. If a rule is found to produce a high number of false positive alerts, consideration must be made to adjust thresholds. Conversely, if a rule is found not to yield any meaningful alerts, thresholds are reconsidered for modification or, the rule is either replaced or retired. In some embodiments, tuning may be performed at least in part by SAD computing device 102 itself,” Lorenz paragraph 0054 lines 1-16).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 2, the rejection of claim 1 is incorporated.  Anderson/Shen/Lorenz further teaches a method wherein the automatic modification of the scenarios further comprises:
adjusting a threshold of an existing scenario (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6) based on strength of the adjusted scenario and a number of cumulative alerts resulting from the adjusted scenario (“we use 5 descriptive attributes to construct the state set for each hour … 4) CC - Utilized processing capacity, which measures the processing capacity constraints for the system indirectly. As the system will drop all alerts after hitting the maximum capacity, the proposed algorithm can leverage this information to better balance the hourly thresholds (5) T - Score Threshold,” Shen page 4 column left lines 2-11); and
deploying the adjusted scenario into the monitored system (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6).

As to dependent claim 4, the rejection of claim 1 is incorporated.  Anderson/Shen/Lorenz further teaches a method comprising training the reinforcement learning agent through an additional training episode until a reward function for an episode converges on a maximum, wherein the reward function is based on (i) rewards for completing a task (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2), (ii) penalties for steps taken to complete the task (“at each time point 𝑡, a reward is actually discounted by a factor of 𝛾,” Shen page 3 column right lines 3-4), and (iii) penalties for triggering alerts (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2).

As to dependent claim 5, the rejection of claim 1 is incorporated.  Anderson/Shen/Lorenz further teaches a method wherein an episode of training of the reinforcement learning agent further comprises:
for a set of steps by the reinforcement learning agent,
(i) reward the reinforcement learning agent with a reward where a step taken causes a result state in which the task is complete (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2),
(ii) penalize the reinforcement learning agent with a small penalty less than the size of the reward where the step taken causes a result state in which the task is not complete and which does not trigger one of the scenarios (“at each time point 𝑡, a reward is actually discounted by a factor of 𝛾,” Shen page 3 column right lines 3-4), and
(iii) penalize the reinforcement learning agent with a large penalty larger than the reward where the action taken causes a result state that triggers one of the scenarios (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2).

As to dependent claim 7, the rejection of claim 1 is incorporated.  Anderson/Shen/Lorenz further teaches a method wherein the recorded episode of steps taken, result states, and triggered alerts is either (i) one of the training episodes or (ii) a simulated episode sampled from a policy learned by the trained reinforcement learning agent (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” Anderson page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3).

As to independent claim 8, Anderson teaches a computing system comprising:
a processor (“Windows,” page 1 section “ABSTRACT” lines 13-14);
a memory operably connected to the processor (“Windows,” page 1 section “ABSTRACT” lines 13-14);
a non-transitory computer-readable medium operably connected to the processor and memory and storing computer-executable instructions (“Windows,” page 1 section “ABSTRACT” lines 13-14) that when executed by at least a processor of the computing system cause the computing system to:
configure an environment to simulate a monitored system for a reinforcement learning agent (“We investigate a more general framework for attacking static PE anti-malware engines based on reinforcement learning, which models more realistic attacker conditions, and subsequently has provides much more modest evasion rates. A reinforcement learning (RL) agent is equipped with a set of functionality-preserving operations that it may perform on the PE file. It learns through a series of games played against the anti-malware engine which sequence of operations is most likely to result in evasion for a given malware sample,” page 1 section “ABSTRACT” lines 16-25);
train the reinforcement learning agent over one or more training episodes to learn a policy that evades scenarios of the simulated monitored system while completing a task, wherein the scenarios are rules for detecting activity that is suspicious (“We investigate a more general framework for attacking static PE anti-malware engines based on reinforcement learning, which models more realistic attacker conditions, and subsequently has provides much more modest evasion rates. A reinforcement learning (RL) agent is equipped with a set of functionality-preserving operations that it may perform on the PE file. It learns through a series of games played against the anti-malware engine which sequence of operations is most likely to result in evasion for a given malware sample,” page 1 section “ABSTRACT” lines 16-25);
record steps taken by the reinforcement learning agent, result states (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3, emphasis added), and triggered alerts for the training episodes (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3, emphasis added);
determine strength of monitoring of the simulated monitored system based on the recorded training episodes, wherein counts of triggered alerts, training time, or number of steps in a training episode serve as proxy metrics for strength of the alerting system or effectiveness of individual scenarios (“For the attack with continuous score, an immediate reward is given by initialscore - reportedscore, and provide a reward of 10.0 if the agent successfully bypasses the model. Note that this can result in negative rewards should the mutations actually increase the original score,” page 5 section “4. EXPERIMENTS” paragraph 4 lines 1-5 – the reward of 10.0 corresponds to a count of 0 triggered alerts); and
automatically modify the scenarios in the monitored system in response to the determined strength (“evasive variants generated by the agent may be used to harden machine learning anti-malware engine via adversarial training,” page 1 section “ABSTRACT” lines 28-30).
Anderson does not appear to expressly teach a system comprising instructions for automatically determining and deploying threshold value sets that increase scenario strength.
Shen teaches a system comprising instructions for automatically determining and deploying threshold value sets that increase scenario strength (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” page 3 column left lines 3-6; “we use 5 descriptive attributes to construct the state set for each hour … 4) CC - Utilized processing capacity, which measures the processing capacity constraints for the system indirectly. As the system will drop all alerts after hitting the maximum capacity, the proposed algorithm can leverage this information to better balance the hourly thresholds (5) T - Score Threshold,” page 4 column left lines 2-11).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the automatic modifying of Anderson to comprise the automatic determining and deploying threshold value sets of Shen.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely automatically determining and deploying threshold value sets (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).
Anderson/Shen does not appear to expressly teach a system comprising instructions for [determining and deploying threshold value sets] based on cumulative alerts measured over one or more reinforcement learning training episodes and adjusting the threshold value sets to maintain the cumulative alerts within a pre-determined range.
Lorenz teaches a system comprising instructions for [determining and deploying threshold value sets] based on cumulative alerts measured over one or more reinforcement learning training episodes and adjusting the threshold value sets to maintain the cumulative alerts within a pre-determined range (“Tuning is a process done by the compliance team or delegates to evaluate which thresholds provide the most productivity for alert generation. Typically, this requires a minimum of three iterations, which include tweaking thresholds set above and below the standard baseline and testing to compare the results between current thresholds and new ones. If a rule is found to produce a high number of false positive alerts, consideration must be made to adjust thresholds. Conversely, if a rule is found not to yield any meaningful alerts, thresholds are reconsidered for modification or, the rule is either replaced or retired. In some embodiments, tuning may be performed at least in part by SAD computing device 102 itself,” paragraph 0054 lines 1-16).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the automatic determining and deploying threshold value sets of Anderson/Shen to comprise the pre-determined range of Lorenz.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely maintaining alerts within a pre-determined range (“Tuning is a process done by the compliance team or delegates to evaluate which thresholds provide the most productivity for alert generation. Typically, this requires a minimum of three iterations, which include tweaking thresholds set above and below the standard baseline and testing to compare the results between current thresholds and new ones. If a rule is found to produce a high number of false positive alerts, consideration must be made to adjust thresholds. Conversely, if a rule is found not to yield any meaningful alerts, thresholds are reconsidered for modification or, the rule is either replaced or retired. In some embodiments, tuning may be performed at least in part by SAD computing device 102 itself,” Lorenz paragraph 0054 lines 1-16).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 9, the rejection of claim 8 is incorporated.  Anderson/Shen/Lorenz further teaches a system wherein the instructions for automatic modification of the scenarios further cause the computing system to:
adjust a threshold of an existing scenario (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6) based on strength of the adjusted scenario and a number of cumulative alerts resulting from the adjusted scenario (“we use 5 descriptive attributes to construct the state set for each hour … 4) CC - Utilized processing capacity, which measures the processing capacity constraints for the system indirectly. As the system will drop all alerts after hitting the maximum capacity, the proposed algorithm can leverage this information to better balance the hourly thresholds (5) T - Score Threshold,” Shen page 4 column left lines 2-11); and
deploy the adjusted scenario into the monitored system (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6).

As to dependent claim 11, the rejection of claim 8 is incorporated.  Anderson/Shen/Lorenz further teaches a system wherein the instructions further cause the computing system to train the reinforcement learning agent through an additional training episode until a reward function for an episode converges on a maximum, wherein the reward function is based on (i) rewards for completing a task (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2), (ii) penalties for steps taken to complete the task (“at each time point 𝑡, a reward is actually discounted by a factor of 𝛾,” Shen page 3 column right lines 3-4), and (iii) penalties for triggering alerts (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2).

As to dependent claim 12, the rejection of claim 8 is incorporated.  Anderson/Shen/Lorenz further teaches a system wherein the instructions for performing an episode of training by the reinforcement learning agent further cause the computing system to:
for a set of steps by the reinforcement learning agent,
(i) reward the reinforcement learning agent with a reward where a step taken causes a result state in which the task is complete (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2),
(ii) penalize the reinforcement learning agent with a small penalty less than the size of the reward where the step taken causes a result state in which the task is not complete and which does not trigger one of the scenarios (“at each time point 𝑡, a reward is actually discounted by a factor of 𝛾,” Shen page 3 column right lines 3-4), and
(iii) penalize the reinforcement learning agent with a large penalty larger than the reward where the action taken causes a result state that triggers one of the scenarios (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2).

As to independent claim 15, Anderson teaches a non-transitory computer-readable medium that included stored thereon computer-executable instructions (“Windows,” page 1 section “ABSTRACT” lines 13-14) that, when executed by a processor accessing memory of a computer cause the computer to:
configure an environment to simulate a monitored system for a reinforcement learning agent (“We investigate a more general framework for attacking static PE anti-malware engines based on reinforcement learning, which models more realistic attacker conditions, and subsequently has provides much more modest evasion rates. A reinforcement learning (RL) agent is equipped with a set of functionality-preserving operations that it may perform on the PE file. It learns through a series of games played against the anti-malware engine which sequence of operations is most likely to result in evasion for a given malware sample,” page 1 section “ABSTRACT” lines 16-25);
train the reinforcement learning agent over one or more training episodes to learn a policy that evades scenarios of the simulated monitored system while completing a task, wherein the scenarios are rules for detecting activity that is suspicious (“We investigate a more general framework for attacking static PE anti-malware engines based on reinforcement learning, which models more realistic attacker conditions, and subsequently has provides much more modest evasion rates. A reinforcement learning (RL) agent is equipped with a set of functionality-preserving operations that it may perform on the PE file. It learns through a series of games played against the anti-malware engine which sequence of operations is most likely to result in evasion for a given malware sample,” page 1 section “ABSTRACT” lines 16-25);
record steps taken by the reinforcement learning agent, result states (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3, emphasis added), and triggered alerts for the training episodes (“2. The attacker has the ability to retrieve a malicious/benign label (or score, if reported) for an arbitrary PE file submitted to the anti-malware engine,” page 1 section “1. INTRODUCTION” paragraph 4 bullet 2 lines 1-3, emphasis added);
determine strength of monitoring of the simulated monitored system based on the recorded training episodes, wherein counts of triggered alerts, training time, or number of steps in a training episode serve as proxy metrics for strength of the alerting system or effectiveness of individual scenarios (“For the attack with continuous score, an immediate reward is given by initialscore - reportedscore, and provide a reward of 10.0 if the agent successfully bypasses the model. Note that this can result in negative rewards should the mutations actually increase the original score,” page 5 section “4. EXPERIMENTS” paragraph 4 lines 1-5 – the reward of 10.0 corresponds to a count of 0 triggered alerts); and
automatically modify the scenarios in the monitored system in response to the determined strength (“evasive variants generated by the agent may be used to harden machine learning anti-malware engine via adversarial training,” page 1 section “ABSTRACT” lines 28-30).
Anderson does not appear to expressly teach a medium comprising instructions for automatically determining and deploying threshold value sets that increase scenario strength.
Shen teaches a medium comprising instructions for automatically determining and deploying threshold value sets that increase scenario strength (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” page 3 column left lines 3-6; “we use 5 descriptive attributes to construct the state set for each hour … 4) CC - Utilized processing capacity, which measures the processing capacity constraints for the system indirectly. As the system will drop all alerts after hitting the maximum capacity, the proposed algorithm can leverage this information to better balance the hourly thresholds (5) T - Score Threshold,” page 4 column left lines 2-11).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the automatic modifying of Anderson to comprise the automatic determining and deploying threshold value sets of Shen.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely automatically determining and deploying threshold value sets (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).
Anderson/Shen does not appear to expressly teach a medium comprising instructions for [determining and deploying threshold value sets] based on cumulative alerts measured over one or more reinforcement learning training episodes and adjusting the threshold value sets to maintain the cumulative alerts within a pre-determined range.
Lorenz teaches a medium comprising instructions for [determining and deploying threshold value sets] based on cumulative alerts measured over one or more reinforcement learning training episodes and adjusting the threshold value sets to maintain the cumulative alerts within a pre-determined range (“Tuning is a process done by the compliance team or delegates to evaluate which thresholds provide the most productivity for alert generation. Typically, this requires a minimum of three iterations, which include tweaking thresholds set above and below the standard baseline and testing to compare the results between current thresholds and new ones. If a rule is found to produce a high number of false positive alerts, consideration must be made to adjust thresholds. Conversely, if a rule is found not to yield any meaningful alerts, thresholds are reconsidered for modification or, the rule is either replaced or retired. In some embodiments, tuning may be performed at least in part by SAD computing device 102 itself,” paragraph 0054 lines 1-16).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the automatic determining and deploying threshold value sets of Anderson/Shen to comprise the pre-determined range of Lorenz.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely maintaining alerts within a pre-determined range (“Tuning is a process done by the compliance team or delegates to evaluate which thresholds provide the most productivity for alert generation. Typically, this requires a minimum of three iterations, which include tweaking thresholds set above and below the standard baseline and testing to compare the results between current thresholds and new ones. If a rule is found to produce a high number of false positive alerts, consideration must be made to adjust thresholds. Conversely, if a rule is found not to yield any meaningful alerts, thresholds are reconsidered for modification or, the rule is either replaced or retired. In some embodiments, tuning may be performed at least in part by SAD computing device 102 itself,” Lorenz paragraph 0054 lines 1-16).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 16, the rejection of claim 15 is incorporated.  Anderson/Shen/Lorenz further teaches a medium wherein the instructions for automatic modification of the scenarios further cause the computer to:
adjust a threshold of an existing scenario (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6) based on strength of the adjusted scenario and a number of cumulative alerts resulting from the adjusted scenario (“we use 5 descriptive attributes to construct the state set for each hour … 4) CC - Utilized processing capacity, which measures the processing capacity constraints for the system indirectly. As the system will drop all alerts after hitting the maximum capacity, the proposed algorithm can leverage this information to better balance the hourly thresholds (5) T - Score Threshold,” Shen page 4 column left lines 2-11); and
deploy the adjusted scenario into the monitored system (“we use an adaptive threshold selection policy to replace the static approach. An hourly update period was chosen to demonstrate the approach through experimental analysis,” Shen page 3 column left lines 3-6).

As to dependent claim 18, the rejection of claim 15 is incorporated.  Anderson/Shen/Lorenz further teaches a medium wherein the instructions further cause the computer to train the reinforcement learning agent through an additional training iteration until the processor determines that one of the following conditions is met (a) a standard deviation of mean ‘reward per episode’ to be less than a first pre-defined value, (b) a number of training iterations are less than a second pre-defined value (“total number of training iterations not reached,” Shen page 4 column right line 5), or (c) a time taken for training is less than a third pre-defined value for the setting.

As to dependent claim 19, the rejection of claim 15 is incorporated.  Anderson/Shen/Lorenz further teaches a medium wherein the instructions for an episode of training of the reinforcement learning agent further cause the computer to:
for a set of steps by the reinforcement learning agent,
(i) reward the reinforcement learning agent with a reward where a step taken causes a result state in which the task is complete (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2),
(ii) penalize the reinforcement learning agent with a small penalty less than the size of the reward where the step taken causes a result state in which the task is not complete and which does not trigger one of the scenarios (“at each time point 𝑡, a reward is actually discounted by a factor of 𝛾,” Shen page 3 column right lines 3-4), and
(iii) penalize the reinforcement learning agent with a large penalty larger than the reward where the action taken causes a result state that triggers one of the scenarios (“rewards of 10.0 / 0.0 are provided for evasion / failed-evasion, respectively,” Anderson page 5 section “4. EXPERIMENTS” paragraph 3 lines 1-2).

Claims 3, 10, 13, 17, and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Anderson in view of Shen, Lorenz, and Azvine et al. (US 2006/0282298 A1, hereinafter Azvine).

As to dependent claim 3, the rejection of claim 10 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a method wherein the automatic modification of the scenarios further comprises:
determining that an existing scenario in the simulated monitored system in the environment is redundant;
automatically removing the existing scenario from the monitored system in response to the determination that the existing rule is redundant.
Azvine teaches a method wherein the automatic modification of the scenarios further comprises:
determining that an existing scenario in the simulated monitored system in the environment is redundant (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” paragraph 0095 lines 7-12);
automatically removing the existing scenario from the monitored system in response to the determination that the existing rule is redundant (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” paragraph 0095 lines 7-12).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the scenarios of Anderson/Shen/Lorenz to comprise the deduplication of Azvine.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely scenario deduplication (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” Azvine paragraph 0095 lines 7-12).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 10, the rejection of claim 8 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a system wherein the instructions for automatic modification of the scenarios further cause the computing system to:
determine that an existing scenario in the simulated monitored system in the environment is redundant;
automatically remove the existing scenario from the monitored system in response to the determination that the existing rule is redundant.
Azvine teaches a system wherein the instructions for automatic modification of the scenarios further cause the computing system to:
determine that an existing scenario in the simulated monitored system in the environment is redundant (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” paragraph 0095 lines 7-12);
automatically remove the existing scenario from the monitored system in response to the determination that the existing rule is redundant (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” paragraph 0095 lines 7-12).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the scenarios of Anderson/Shen/Lorenz to comprise the deduplication of Azvine.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely scenario deduplication (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” Azvine paragraph 0095 lines 7-12).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 13, the rejection of claim 8 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a system wherein the instructions further cause the computing system to introduce at least one of (i) a new account type; (ii) a new transaction channel; and (iii) an additional scenario to the monitored system in the environment.
Azvine teaches a system wherein the instructions further cause the computing system to introduce at least one of (i) a new account type; (ii) a new transaction channel; and (iii) an additional scenario to the monitored system in the environment (“The results of each learning problem are then combined to produce a set of rules, each of which may contain range-restrictions for every attribute within each data item. Each set is then added into a database to produce the overall user model,” paragraph 0032 lines 9-13).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the detection of Anderson/Shen/Lorenz to comprise the additional scenario of Azvine.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely adding detection scenarios (“The results of each learning problem are then combined to produce a set of rules, each of which may contain range-restrictions for every attribute within each data item. Each set is then added into a database to produce the overall user model,” Azvine paragraph 0032 lines 9-13).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 17, the rejection of claim 15 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a medium wherein the instructions for automatic modification of the scenarios further cause the computer to:
determine that an existing scenario in the simulated monitored system in the environment is redundant;
automatically remove the existing scenario from the monitored system in response to the determination that the existing rule is redundant.
Azvine teaches a medium wherein the instructions for automatic modification of the scenarios further cause the computer to:
determine that an existing scenario in the simulated monitored system in the environment is redundant (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” paragraph 0095 lines 7-12);
automatically remove the existing scenario from the monitored system in response to the determination that the existing rule is redundant (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” paragraph 0095 lines 7-12).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the scenarios of Anderson/Shen/Lorenz to comprise the deduplication of Azvine.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely scenario deduplication (“The set of rules is filtered to remove those rules subsumed by other rules, and each rule is then filtered to remove any redundant elements; for example the first rule contains two literals which say the same thing, so one of these may be removed (Step 6),” Azvine paragraph 0095 lines 7-12).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 20, the rejection of claim 15 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a medium wherein the instructions further cause the computer to introduce at least one of (i) a new account type; (ii) a new transaction channel; and (iii) an additional scenario to the monitored system in the environment.
Azvine teaches a medium wherein the instructions further cause the computer to introduce at least one of (i) a new account type; (ii) a new transaction channel; and (iii) an additional scenario to the monitored system in the environment (“The results of each learning problem are then combined to produce a set of rules, each of which may contain range-restrictions for every attribute within each data item. Each set is then added into a database to produce the overall user model,” paragraph 0032 lines 9-13).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the detection of Anderson/Shen/Lorenz to comprise the additional scenario of Azvine.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely adding detection scenarios (“The results of each learning problem are then combined to produce a set of rules, each of which may contain range-restrictions for every attribute within each data item. Each set is then added into a database to produce the overall user model,” Azvine paragraph 0032 lines 9-13).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

Claims 6 and 14 are rejected under 35 U.S.C. § 103 as being unpatentable over Anderson in view of Shen, Lorenz, and Kala et al. (US 2021/0081948 A1, hereinafter Kala).

As to dependent claim 6, the rejection of claim 1 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a method comprising automatically tuning transaction constraints for account types or transaction channels.
Kala teaches a method comprising automatically tuning transaction constraints for account types or transaction channels (“As shown in the above exemplary algorithm, individual transactions may be marked corresponding to different levels of fraud likeliness, such as Fraud Match Level 1, Fraud Match Level 2, Fraud Match Level 3, Fraud Match Level 4, or Fraud Match Level 5. The fraud levels may be found corresponding to any of a number of transaction parameters. For example, the algorithm above uses the transaction amount (TransAmount) parameter, but the full algorithm would take many or all of the potential transaction parameters into account, assigning each a fraud level as appropriate. In the above example, upper and lower bounds for transaction amounts may be set to determine which fraud level to assign. Specifically, TranAmt Level 2 may result in Fraud Match Level 2, TransAmt Level 3 may result in Fraud Match Level 3, etc, where higher transaction amounts may indicate higher risk of fraud. In some embodiments, the specific upper and lower bounds for each transaction amount level may be automatically set by the risk management system, or in some embodiments may be selected by a user via the risk management GUI fraud rules selection. Similar processes of fraud rule selection may be used for additional transaction parameters, particularly those fraud parameters that are determined to be most likely associated with fraudulent transactions. In such embodiments, particular issuers may customize their fraud detection through the risk management GUI and the particular fraud parameters found to be the most likely indicators of fraudulent transactions. For example, the risk management GUI may indicate that transaction amount is a fraud parameter often corresponding with fraudulent transactions,” paragraph 0029 lines 1-30).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the detection of Anderson/Shen/Lorenz to comprise the automatic tuning of Kala.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely automatically tuning the detection (“As shown in the above exemplary algorithm, individual transactions may be marked corresponding to different levels of fraud likeliness, such as Fraud Match Level 1, Fraud Match Level 2, Fraud Match Level 3, Fraud Match Level 4, or Fraud Match Level 5. The fraud levels may be found corresponding to any of a number of transaction parameters. For example, the algorithm above uses the transaction amount (TransAmount) parameter, but the full algorithm would take many or all of the potential transaction parameters into account, assigning each a fraud level as appropriate. In the above example, upper and lower bounds for transaction amounts may be set to determine which fraud level to assign. Specifically, TranAmt Level 2 may result in Fraud Match Level 2, TransAmt Level 3 may result in Fraud Match Level 3, etc, where higher transaction amounts may indicate higher risk of fraud. In some embodiments, the specific upper and lower bounds for each transaction amount level may be automatically set by the risk management system, or in some embodiments may be selected by a user via the risk management GUI fraud rules selection. Similar processes of fraud rule selection may be used for additional transaction parameters, particularly those fraud parameters that are determined to be most likely associated with fraudulent transactions. In such embodiments, particular issuers may customize their fraud detection through the risk management GUI and the particular fraud parameters found to be the most likely indicators of fraudulent transactions. For example, the risk management GUI may indicate that transaction amount is a fraud parameter often corresponding with fraudulent transactions,” Kala paragraph 0029 lines 1-30).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

As to dependent claim 14, the rejection of claim 8 is incorporated.
Anderson/Shen/Lorenz does not appear to expressly teach a system wherein the monitored system is a financial transaction system and the task is transferring funds into a particular account, wherein the instructions further cause the computing system to evaluate whether the result state triggers one or more of a rapid movement of funds, high-risk geography, significant cash, or ATM anomaly scenario after a step taken by the reinforcement learning agent.
Kala teaches a system wherein the monitored system is a financial transaction system and the task is transferring funds into a particular account, wherein the instructions further cause the computing system to evaluate whether the result state triggers one or more of a rapid movement of funds, high-risk geography, significant cash, or ATM anomaly scenario after a step taken by the reinforcement learning agent  (“As shown in the above exemplary algorithm, individual transactions may be marked corresponding to different levels of fraud likeliness, such as Fraud Match Level 1, Fraud Match Level 2, Fraud Match Level 3, Fraud Match Level 4, or Fraud Match Level 5. The fraud levels may be found corresponding to any of a number of transaction parameters. For example, the algorithm above uses the transaction amount (TransAmount) parameter, but the full algorithm would take many or all of the potential transaction parameters into account, assigning each a fraud level as appropriate. In the above example, upper and lower bounds for transaction amounts may be set to determine which fraud level to assign. Specifically, TranAmt Level 2 may result in Fraud Match Level 2, TransAmt Level 3 may result in Fraud Match Level 3, etc, where higher transaction amounts may indicate higher risk of fraud. In some embodiments, the specific upper and lower bounds for each transaction amount level may be automatically set by the risk management system, or in some embodiments may be selected by a user via the risk management GUI fraud rules selection. Similar processes of fraud rule selection may be used for additional transaction parameters, particularly those fraud parameters that are determined to be most likely associated with fraudulent transactions. In such embodiments, particular issuers may customize their fraud detection through the risk management GUI and the particular fraud parameters found to be the most likely indicators of fraudulent transactions. For example, the risk management GUI may indicate that transaction amount is a fraud parameter often corresponding with fraudulent transactions,” paragraph 0029 lines 1-30).
Accordingly, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the detection of Anderson/Shen/Lorenz to comprise the financial transactions of Kala.  (1) The Examiner finds that the prior art included each claim element listed above, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  (2) The Examiner finds that one of ordinary skill in the art could have combined the elements as claimed by known software development methods, and that in combination, each element merely performs the same function as it does separately.  (3) The Examiner finds that one of ordinary skill in the art would have recognized that the results of the combination were predictable, namely detecting within financial transaction (“As shown in the above exemplary algorithm, individual transactions may be marked corresponding to different levels of fraud likeliness, such as Fraud Match Level 1, Fraud Match Level 2, Fraud Match Level 3, Fraud Match Level 4, or Fraud Match Level 5. The fraud levels may be found corresponding to any of a number of transaction parameters. For example, the algorithm above uses the transaction amount (TransAmount) parameter, but the full algorithm would take many or all of the potential transaction parameters into account, assigning each a fraud level as appropriate. In the above example, upper and lower bounds for transaction amounts may be set to determine which fraud level to assign. Specifically, TranAmt Level 2 may result in Fraud Match Level 2, TransAmt Level 3 may result in Fraud Match Level 3, etc, where higher transaction amounts may indicate higher risk of fraud. In some embodiments, the specific upper and lower bounds for each transaction amount level may be automatically set by the risk management system, or in some embodiments may be selected by a user via the risk management GUI fraud rules selection. Similar processes of fraud rule selection may be used for additional transaction parameters, particularly those fraud parameters that are determined to be most likely associated with fraudulent transactions. In such embodiments, particular issuers may customize their fraud detection through the risk management GUI and the particular fraud parameters found to be the most likely indicators of fraudulent transactions. For example, the risk management GUI may indicate that transaction amount is a fraud parameter often corresponding with fraudulent transactions,” Kala paragraph 0029 lines 1-30).  Therefore, the rationale to support a conclusion that the claim would have been obvious is that the combining prior art elements according to known methods to yield predictable results to one of ordinary skill in the art. See MPEP § 2143(I)(A).

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure:
US 2021/0182385 A1 disclosing reinforcement learning with automatic configuration adjustment
Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way.  A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art.  In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
In the interests of compact prosecution, Applicant is invited to contact the examiner via electronic media pursuant to USPTO policy outlined MPEP § 502.03.  All electronic communication must be authorized in writing.  Applicant may wish to file an Internet Communications Authorization Form PTO/SB/439.  Applicant may wish to request an interview using the Interview Practice website: http://www.uspto.gov/patent/laws-and-regulations/interview-practice.
Applicant is reminded Internet e-mail may not be used for communication for matters under 35 U.S.C. § 132 or which otherwise require a signature.  A reply to an Office action may NOT be communicated by Applicant to the USPTO via Internet e-mail.  If such a reply is submitted by Applicant via Internet e-mail, a paper copy will be placed in the appropriate patent application file with an indication that the reply is NOT ENTERED.  See MPEP § 502.03(II).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Ryan Barrett whose telephone number is 571 270 3311.  The examiner can normally be reached 9:00am to 5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Michelle Bechtold can be reached at 571 431 0762.  The fax phone number for the organization where this application or proceeding is assigned is 571 273 8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Ryan Barrett/
Primary Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Show 1 earlier event
Sep 16, 2025
Non-Final Rejection mailed — §103, §OTHER
Dec 15, 2025
Applicant Interview (Telephonic)
Dec 15, 2025
Examiner Interview Summary
Dec 15, 2025
Response Filed
Dec 23, 2025
Final Rejection mailed — §103, §OTHER
Mar 23, 2026
Request for Continued Examination
Mar 25, 2026
Response after Non-Final Action
Apr 01, 2026
Non-Final Rejection mailed — §103, §OTHER (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/813,708
Patent 12634413
TWO-DIMENSIONAL VIEW OF A PRESENTATION IN A THREE-DIMENSIONAL VIDEOCONFERENCING ENVIRONMENT
3y 10m to grant Granted May 19, 2026
17/974,545
Patent 12632784
SYSTEM, METHOD, AND COMPUTER-READABLE STORAGE MEDIUM FOR FEDERATED LEARNING OF LOCAL MODEL BASED ON LEARNING DIRECTION OF GLOBAL MODEL
3y 6m to grant Granted May 19, 2026
18/086,781
Patent 12626133
STRUCTURAL OBFUSCATION FOR PROTECTING DEEP LEARNING MODELS ON EDGE DEVICES
3y 4m to grant Granted May 12, 2026
17/899,519
Patent 12602612
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM PRODUCT
3y 7m to grant Granted Apr 14, 2026
17/535,844
Patent 12585525
BUSINESS LANGUAGE PROCESSING USING LoQoS AND rb-LSTM
4y 3m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
65%
Grant Probability
99%
With Interview (+43.2%)
3y 3m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 413 resolved cases by this examiner. Grant probability derived from career allowance rate.
REINFORCEMENT LEARNING AGENT TO EVALUATE MONITORING SYSTEM STRENGTH

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email