Prosecution Insights
Last updated: April 19, 2026
Application No. 17/411,636

SYSTEMS AND METHODS FOR REINFORCEMENT LEARNING WITH LOCAL STATE AND REWARD DATA

Final Rejection §101§102§103§Other
Filed
Aug 25, 2021
Examiner
KEATON, SHERROD L
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Royal Bank Of Canada
OA Round
2 (Final)
52%
Grant Probability
Moderate
3-4
OA Rounds
4y 6m
To Grant
88%
With Interview

Examiner Intelligence

Grants 52% of resolved cases
52%
Career Allow Rate
295 granted / 563 resolved
-2.6% vs TC avg
Strong +36% interview lift
Without
With
+36.1%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
32 currently pending
Career history
595
Total Applications
across all art units

Statute-Specific Performance

§101
14.9%
-25.1% vs TC avg
§103
62.0%
+22.0% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
8.0%
-32.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 563 resolved cases

Office Action

§101 §102 §103 §Other
DETAILED ACTION This action is in response to the filing of 11-21-2025. Claims 1-20 are pending and have been considered below: Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 rejected under 35 U.S.C. 101 have been withdrawn. Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1-3, 6-11, 14-18 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Burhani et al. (“Burhani” 20190370649 A1). Claim 1: Burhani discloses a computer-implemented system for training an automated agent, the system comprising: a communication interface; at least one processor; memory in communication with said at least one processor; software code stored in said memory, which when executed at said at least one processor (Paragraph 26, processor and memory) causes said system to: instantiate a first automated agent interaction with an environment having a plurality of resources (Paragraphs 6 and 52, environment; 90-93, environment with plurality of task), wherein the first automated agent maintains a reinforcement learning neural network and generates, according to outputs of said reinforcement learning neural network, signals for communicating resource task requests (Paragraphs 4-5, 25 and 90; network generates task request); receive, by way of said communication interface, current state data of a resource for a first task completed in response to a resource task request for a specific resource of the plurality of resources, the resource task request communicated by said first automated agent, the current state data generated based on task data relating to the specific resource obtained from the environment; (Paragraphs 4-5, 25; values (state) of resource provided in response to request and 90-93; based trade (task) environment resources obtained); receive, by way of said communication interface, historical state metrics of the resource computed based on a plurality of historical tasks related to the specific resource completed in response to a plurality of resource task requests for the specific resource (Paragraphs 4-5, 15-16, state metrics; 25-26, metrics compared for a time interval (historical events) and 90-93, looks at completed trades (task) for a given resource); compute normalized state data based on the current state data (Paragraphs 18-20, data computed during time frames (i.e. current)); provide the historical state metrics and the normalized state data to the reinforcement learning neural network of said first automated agent for training (abstract (output from model trains agent) Paragraphs 25, 86-88; normalize input state provide to model). and generate a signal for communicating resource task requests based on outputs of said reinforcement learning neural network of said trained first automated agent (abstract, Paragraphs 4-5, 25, 42 and 52 (signal generated based reinforcement training) and 90-93 (signal regarding task request and completion)). Claim 2: Burhani discloses a system of claim 1, wherein the historical state metrics of the resource are stored in a database and comprise at least one of: an average historical state metric of the resource, a standard deviation of the average historical state metric, and a normalized value based on the average historical state metric and the standard deviation (Burhani: Paragraphs 14-15; standard deviation for VWAP and 45 historical VWAP (metric data)). Claim 3: Burhani discloses a system of claim 1, wherein the resource is a security (Paragraph 90; security), and the historical state metrics and the normalized state data each comprises at least a respective slippage of the security (Burhani: Paragraphs 42, 62 and 92; reward attempts to minimize slippage). Claim 6: Burhani discloses a system of claim 1, wherein the software code, when executed at said at least one processor, causes said system to: receive, by way of said communication interface, current reward data of the resource for the first task; receive, by way of said communication interface, historical reward metrics of the resource computed based on the plurality of historical tasks; compute normalized reward data based on the current reward data; and provide the historical reward metrics and the normalized reward data to the reinforcement learning neural network of said first automated agent for training (Burhani: Paragraphs 5-6, 9, 24 (learning based on following historical VWAP)42-43 (reward normalization based VWAP), 86 (state/input data)). This data is utilized to normalize inputs which assist in normalizing rewards for training an agent. Claim 7: Burhani discloses a system of claim 6, wherein the historical reward metrics of the resource is stored in the database and comprises at least one of: an average historical reward metric of the resource, a standard deviation of the average historical reward metric, and a normalized value based on the average historical reward metric and the standard deviation of the average historical reward metric (Burhani: Paragraphs 42-43, 67 and 85 (reward metric)). Claim 8: Burhani discloses a system of claim 6, wherein the resource is a security, and the historical reward metrics and the normalized reward data each comprises at least a respective value determined based on a slippage of the security (Burhani: Paragraphs 42, 62 and 92; reward attempts to minimize slippage). Claims 9 and 17 are similar in scope to claim 1 and therefore rejected under the same rationale. Non-transitory computer readable storage medium (Burhani: Paragraph 27) Claim 10 is similar in scope to claim 2 and therefore rejected under the same rationale. Claims 11 and 18 are similar in scope to claim 3 and therefore rejected under the same rationale. Claims 14 and 20 are similar in scope to claim 6 and therefore rejected under the same rationale. Claim 15 is similar in scope to claim 7 and therefore rejected under the same rationale. Claim 16 is similar in scope to claim 8 and therefore rejected under the same rationale. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 4-5, 12-13 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Burhani et al. (“Burhani” 20190370649 A1) in view of Shi et al. (“Shi” 20210019125 A1). Claim 4: Burhani discloses a system of claim 1, wherein the software code, when executed at said at least one processor, causes said system to: instantiate a second automated agent that maintains a second reinforcement learning neural network and generates, according to outputs of said second reinforcement learning neural network, signals for communicating resource task requests; (Paragraphs 25; instantiate 104; second agent) receive, by way of said communication interface, second current state data of the resource for a second task completed in response to a resource task request communicated by said second automated agent, wherein the second task and the first task are completed concurrently (Paragraph 61; concurrent processing of orders); receive, by way of said communication interface, the historical state metrics of the resource; compute a second normalized state data based on the second current state data; and provide the historical state metrics and the second normalized state data to the second reinforcement learning neural network of said second automated agent for training (Paragraphs 25, 86-88 (normalization) 105; first and second agent work cooperatively). Burhani discloses a first and second agent, and further discloses features could be implemented by each agent (Paragraphs 104-105). Shi is further provided because it discloses a reinforcement model where workers (interchangeable with agents) are concurrently running task in order to determine optimal solution based on reward (Figure 3 and Paragraph 137), this solution provides training (Paragraphs 71 and 133). This concurrent operation could be utilized with the first and second agent of Burhani. Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to use a known technique to improve a similar device in the same way and provide concurrently working agents within the environment of Burhani. One would have been motivated to provide the functionality because it improves efficiency and expands reward determination for improved solution implementation. Claim 5: Burhani and Shi disclose a system of claim 4, wherein the software code, when executed at said at least one processor, causes said system to: receive, by way of said communication interface, a plurality of local state metrics from said first automated agent; and compute the second normalized state data based on at least the second current state data and the plurality of local state metrics from said first automated agent (Burhani: abstract Paragraphs 24-25, 104-105; first and second agent can execute same operations to determine rewards). Claims 12 and 19 are similar in scope to claim 4 and therefore rejected under the same rationale. Claim 13 is similar in scope to claim 5 and therefore rejected under the same rationale. Response to Arguments Applicant's arguments have been fully considered. Regarding the 101, the rejection has been withdrawn. Regarding the amended claims, Burhani previous and newly cited areas appear to still capture the limitations. See additional Paragraphs 90-93. Conclusion The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure: 20070038550 A1 [0106] Applicant is required under 37 C.F.R. § 1.111(c) to consider these references fully when responding to this action. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. A reference is relevant for all it contains and may be relied upon for all that it would have reasonably suggested to one having ordinary skill in the art. In re Heck, 699 F.2d 1331, 1332-33, 216 U.S.P.Q. 1038, 1039 (Fed. Cir. 1983) (quoting In re Lemelson, 397 F.2d 1006, 1009, 158 U.S.P.Q. 275, 277 (C.C.P.A. 1968)). In the interests of compact prosecution, Applicant is invited to contact the examiner via electronic media pursuant to USPTO policy outlined MPEP § 502.03. All electronic communication must be authorized in writing. Applicant may wish to file an Internet Communications Authorization Form PTO/SB/439. Applicant may wish to request an interview using the Interview Practice website: http://www.uspto.gov/patent/laws-and-regulations/interview-practice. Applicant is reminded Internet e-mail may not be used for communication for matters under 35 U.S.C. § 132 or which otherwise require a signature. A reply to an Office action may NOT be communicated by Applicant to the USPTO via Internet e-mail. If such a reply is submitted by Applicant via Internet e-mail, a paper copy will be placed in the appropriate patent application file with an indication that the reply is NOT ENTERED. See MPEP § 502.03(II). Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHERROD KEATON whose telephone number is 571-270-1697. The examiner can normally be reached 9:30am to 5:00pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor MICHELLE BECHTOLD can be reached at 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SHERROD L KEATON/ Primary Examiner, Art Unit 2148 3-10-2026
Read full office action

Prosecution Timeline

Aug 25, 2021
Application Filed
May 17, 2025
Non-Final Rejection — §101, §102, §103
Nov 21, 2025
Response Filed
Mar 13, 2026
Final Rejection — §101, §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12566823
SYSTEMS AND METHODS FOR INTERPOLATIVE CENTROID CONTRASTIVE LEARNING
2y 5m to grant Granted Mar 03, 2026
Patent 12547820
Automated Generation Of Commentator-Specific Scripts
2y 5m to grant Granted Feb 10, 2026
Patent 12530587
SYSTEMS AND METHODS FOR CONTRASTIVE LEARNING WITH SELF-LABELING REFINEMENT
2y 5m to grant Granted Jan 20, 2026
Patent 12524147
Modality Learning on Mobile Devices
2y 5m to grant Granted Jan 13, 2026
Patent 12524603
METHODS FOR RECOGNIZING AND INTERPRETING GRAPHIC ELEMENTS
2y 5m to grant Granted Jan 13, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
88%
With Interview (+36.1%)
4y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 563 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month