Last updated: April 19, 2026

Application No. 17/411,636

SYSTEMS AND METHODS FOR REINFORCEMENT LEARNING WITH LOCAL STATE AND REWARD DATA

Final Rejection §101§102§103§Other

Filed

Aug 25, 2021

Examiner

KEATON, SHERROD L

Art Unit

2148

Tech Center

2100 — Computer Architecture & Software

Assignee

Royal Bank Of Canada

OA Round

2 (Final)

This examiner grants 52% of cases after interview

— +36.1% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 563 resolved cases, 2023–2026

Examiner Intelligence

KEATON, SHERROD L View full profile →

Grants 52% of resolved cases

Career Allow Rate

295 granted / 563 resolved

-2.6% vs TC avg

Strong +36% interview lift

Without

With

+36.1%

Interview Lift

resolved cases with interview

Typical timeline

4y 6m

Avg Prosecution

32 currently pending

Career history

595

Total Applications

across all art units

Statute-Specific Performance

§101

14.9%

-25.1% vs TC avg

§103

62.0%

+22.0% vs TC avg

§102

11.1%

-28.9% vs TC avg

§112

8.0%

-32.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 563 resolved cases

Office Action

§101 §102 §103 §Other

DETAILED ACTION
This action is in response to the filing of 11-21-2025. Claims 1-20 are pending and have been considered below:

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 rejected under 35 U.S.C. 101 have been withdrawn.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-3, 6-11, 14-18 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Burhani et al. (“Burhani” 20190370649 A1).

Claim 1: Burhani discloses a computer-implemented system for training an automated agent, the system comprising: 
a communication interface; at least one processor; memory in communication with said at least one processor; software code stored in said memory, which when executed at said at least one processor  (Paragraph 26, processor and memory) causes said system to: 
instantiate a first automated agent interaction with an environment having a plurality of resources (Paragraphs 6 and 52, environment; 90-93, environment with plurality of task), wherein the first automated agent maintains a reinforcement learning neural network and generates, according to outputs of said reinforcement learning neural network, signals for communicating resource task requests (Paragraphs 4-5, 25 and 90; network generates task request); 

receive, by way of said communication interface, current state data of a resource for a first task completed in response to a resource task request for a specific resource of the plurality of resources, the resource task request communicated by said first automated agent, the current state data generated based on task data relating to the specific resource obtained from the environment;
(Paragraphs 4-5, 25; values (state) of resource provided in response to request and 90-93; based trade (task) environment resources obtained); 
receive, by way of said communication interface, historical state metrics of the resource computed based on a plurality of historical tasks related to the specific resource completed in response to a plurality of resource task requests for the specific resource (Paragraphs 4-5, 15-16, state metrics; 25-26, metrics compared for a time interval (historical events) and 90-93, looks at completed trades (task) for a given resource); 

compute normalized state data based on the current state data (Paragraphs 18-20, data computed during time frames (i.e. current));
provide the historical state metrics and the normalized state data to the reinforcement learning neural network of said first automated agent for training (abstract (output from model trains agent) Paragraphs 25, 86-88; normalize input state provide to model).
and generate a signal for communicating resource task requests based on outputs of said reinforcement learning neural network of said trained first automated agent (abstract, Paragraphs 4-5, 25, 42 and 52 (signal generated based reinforcement training) and 90-93 (signal regarding task request and completion)).

Claim 2: Burhani discloses a system of claim 1, wherein the historical state metrics of the resource are stored in a database and comprise at least one of: an average historical state metric of the resource, a standard deviation of the average historical state metric, and a normalized value based on the average historical state metric and the standard deviation (Burhani: Paragraphs 14-15; standard deviation for VWAP and 45 historical VWAP (metric data)). 

Claim 3: Burhani discloses a system of claim 1, wherein the resource is a security (Paragraph 90; security), and the historical state metrics and the normalized state data each comprises at least a respective slippage of the security (Burhani: Paragraphs 42, 62 and 92; reward attempts to minimize slippage).  

Claim 6: Burhani discloses a system of claim 1, wherein the software code, when executed at said at least one processor, causes said system to: receive, by way of said communication interface, current reward data of the resource for the first task; receive, by way of said communication interface, historical reward metrics of the resource computed based on the plurality of historical tasks; compute normalized reward data based on the current reward data; and provide the historical reward metrics and the normalized reward data to the reinforcement learning neural network of said first automated agent for training (Burhani: Paragraphs 5-6, 9, 24 (learning based on following historical VWAP)42-43 (reward normalization based VWAP), 86 (state/input data)). This data is utilized to normalize inputs which assist in normalizing rewards for training an agent. 

Claim 7: Burhani discloses a system of claim 6, wherein the historical reward metrics of the resource is stored in the database and comprises at least one of: an average historical reward metric of the resource, a standard deviation of the average historical reward metric, and a normalized value based on the average historical reward metric and the standard deviation of the average historical reward metric (Burhani: Paragraphs 42-43, 67 and 85 (reward metric)).  
Claim 8: Burhani discloses a system of claim 6, wherein the resource is a security, and the historical reward metrics and the normalized reward data each comprises at least a respective value determined based on a slippage of the security (Burhani: Paragraphs 42, 62 and 92; reward attempts to minimize slippage). 

Claims 9 and 17 are similar in scope to claim 1 and therefore rejected under the same rationale. 
Non-transitory computer readable storage medium (Burhani: Paragraph 27)

Claim 10 is similar in scope to claim 2 and therefore rejected under the same rationale. 

Claims 11 and 18 are similar in scope to claim 3 and therefore rejected under the same rationale. 

Claims 14 and 20 are similar in scope to claim 6 and therefore rejected under the same rationale. 

Claim 15 is similar in scope to claim 7 and therefore rejected under the same rationale. 

Claim 16 is similar in scope to claim 8 and therefore rejected under the same rationale. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 4-5, 12-13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burhani et al. (“Burhani” 20190370649 A1) in view of Shi et al. (“Shi” 20210019125 A1). 

Claim 4: Burhani discloses a system of claim 1, wherein the software code, when executed at said at least one processor, causes said system to: instantiate a second automated agent that maintains a second reinforcement learning neural network and generates, according to outputs of said second reinforcement learning neural network, signals for communicating resource task requests;  (Paragraphs 25; instantiate 104; second agent) receive, by way of said communication interface, second current state data of the resource for a second task completed in response to a resource task request communicated by said second automated agent, wherein the second task and the first task are completed concurrently (Paragraph 61; concurrent processing of orders); receive, by way of said communication interface, the historical state metrics of the resource; compute a second normalized state data based on the second current state data; and provide the historical state metrics and the second normalized state data to the second reinforcement learning neural network of said second automated agent for training (Paragraphs 25, 86-88 (normalization) 105; first and second agent work cooperatively). 
Burhani discloses a first and second agent, and further discloses features could be implemented by each agent (Paragraphs 104-105).
Shi is further provided because it discloses a reinforcement model where workers (interchangeable with agents) are concurrently running task in order to determine optimal solution based on reward (Figure 3 and Paragraph 137), this solution provides training (Paragraphs 71 and 133). This concurrent operation could be utilized with the first and second agent of Burhani. Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to use a known technique to improve a similar device in the same way and provide concurrently working agents within the environment of Burhani. One would have been motivated to provide the functionality because it improves efficiency and expands reward determination for improved solution implementation.     
 
Claim 5: Burhani and Shi disclose a system of claim 4, wherein the software code, when executed at said at least one processor, causes said system to: receive, by way of said communication interface, a plurality of local state metrics from said first automated agent; and compute the second normalized state data based on at least the second current state data and the plurality of local state metrics from said first automated agent (Burhani: abstract Paragraphs 24-25, 104-105; first and second agent can execute same operations to determine rewards).  

Claims 12 and 19 are similar in scope to claim 4 and therefore rejected under the same rationale. 
Claim 13 is similar in scope to claim 5 and therefore rejected under the same rationale.

Response to Arguments
Applicant's arguments have been fully considered.
Regarding the 101, the rejection has been withdrawn.
Regarding the amended claims, Burhani previous and newly cited areas appear to still capture the limitations. See additional Paragraphs 90-93.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure:

20070038550 A1  [0106]

Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)).
In the interests of compact prosecution, Applicant is invited to contact the examiner via electronic media pursuant to USPTO policy outlined MPEP § 502.03.  All electronic communication must be authorized in writing.  Applicant may wish to file an Internet Communications Authorization Form PTO/SB/439.  Applicant may wish to request an interview using the Interview Practice website: http://www.uspto.gov/patent/laws-and-regulations/interview-practice.
Applicant is reminded Internet e-mail may not be used for communication for matters under 35 U.S.C. § 132 or which otherwise require a signature.  A reply to an Office action may NOT be communicated by Applicant to the USPTO via Internet e-mail. If such a reply is submitted by Applicant via Internet e-mail, a paper copy will be placed in the appropriate patent application file with an indication that the reply is NOT ENTERED. See MPEP § 502.03(II).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHERROD KEATON whose telephone number is 571-270-1697.  The examiner can normally be reached 9:30am to 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool.  To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor MICHELLE BECHTOLD can be reached at 571-431-0762.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SHERROD L KEATON/     Primary Examiner, Art Unit 2148       
3-10-2026

Read full office action

Prosecution Timeline

Aug 25, 2021

Application Filed

May 17, 2025

Non-Final Rejection — §101, §102, §103

Nov 21, 2025

Response Filed

Mar 13, 2026

Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/188,232

Patent 12566823

SYSTEMS AND METHODS FOR INTERPOLATIVE CENTROID CONTRASTIVE LEARNING

2y 5m to grant Granted Mar 03, 2026

17/674,355

Patent 12547820

Automated Generation Of Commentator-Specific Scripts

2y 5m to grant Granted Feb 10, 2026

17/375,728

Patent 12530587

SYSTEMS AND METHODS FOR CONTRASTIVE LEARNING WITH SELF-LABELING REFINEMENT

2y 5m to grant Granted Jan 20, 2026

18/517,825

Patent 12524147

Modality Learning on Mobile Devices

2y 5m to grant Granted Jan 13, 2026

18/609,638

Patent 12524603

METHODS FOR RECOGNIZING AND INTERPRETING GRAPHIC ELEMENTS

2y 5m to grant Granted Jan 13, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

52%

Grant Probability

88%

With Interview (+36.1%)

4y 6m

Median Time to Grant

Moderate

PTA Risk

Based on 563 resolved cases by this examiner. Grant probability derived from career allow rate.