Prosecution Insights
Last updated: April 17, 2026
Application No. 18/103,426

Systems, Methods, and Media for Selecting Actions to be Taken By a Reinforcement Learning Agents

Final Rejection §102
Filed
Jan 30, 2023
Examiner
SANKS, SCHYLER S
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
unknown
OA Round
2 (Final)
72%
Grant Probability
Favorable
3-4
OA Rounds
2y 11m
To Grant
88%
With Interview

Examiner Intelligence

Grants 72% — above average
72%
Career Allow Rate
362 granted / 501 resolved
+17.3% vs TC avg
Strong +16% interview lift
Without
With
+15.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
40 currently pending
Career history
541
Total Applications
across all art units

Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
46.7%
+6.7% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
32.2%
-7.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 501 resolved cases

Office Action

§102
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 102 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1-21 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Da Silva (US20210073912A1). Regarding claim 1, Da Silva teaches a system for selecting an action to be taken by a reinforcement learning agent in an environment (Fig. 3A, Algorithm 1), comprising a memory (Claim 1) and a hardware processor coupled to the memory and configured to at least (Claim 1): determine a first variance for a first state of the environment (Fig. 3A, Algorithm 1, line 4) wherein the first variance is based on a state of the environment before the first state (Figure 3A, when t ≥3, let the first state be the state when t≥3, then any variance determined in line 4 implicitly relies upon variances determined before it because “a” leads to s’ which becomes the new “s” in a subsequent iteration); determine that the first variance meets a threshold (Fig. 3A, Algorithm 1, line 5); in response to determining that the first variance meets the threshold: request an identification of a first action to be taken by the agent from a human (Fig. 3A, Algorithm 1, line 6, see ¶31 and ¶41, “A demonstrator 30 may also be a human”); and receive the identification of the first action (Fig. 3A, Algorithm 1, line 6); and cause the first action to be taken by the agent (Fig. 3A, Algorithm 1, line 10). Regarding claim 2, Da Silva teaches all of the limitations of claim 1, wherein the hardware processor is also configured to: determine a second variance for a second state of the environment, wherein the second variance is based wherein the second variance is based on a state of the environment before the first state (Figure 3A, when t ≥3, let the first state be the state when t≥3, then any variance determined in line 4 implicitly relies upon variances determined before it because “a” leads to s’ which becomes the new “s” in a subsequent iteration. Under the same analysis, the second variance can be when t is one less than that for the first variance); determine that the second variance does not meet the threshold (Figure 3A, Algorithm 1, line 7); in response to determining that the second variance does not meet the threshold: select a second action to be taken by the agent based on a reinforcement learning policy; and cause the second action to be taken by the agent (Figure 3A, Algorithm 1, lines 8 and 10). Regarding claim 3, Da Silva teaches all of the limitations of claim 1, wherein the agent is an autonomous vehicle (¶29). Regarding claim 4, Da Silva teaches all of the limitations of claim 1, wherein the agent is a robot (¶29). Regarding claim 5, Da Silva teaches system for selecting an action to be taken by a reinforcement learning agent in an environment (Fig. 3A, Algorithm 1), comprising a memory (Claim 1) and a hardware processor coupled to the memory and configured to at least (Claim 1): select a first action to be taken by the agent based on a reinforcement learning policy (Figure 3A, Algorithm 1, Line 6); determine that the first action is to request an action selection from a human (Figure 3A, Algorithm 1, Line 6); in response to determining that the first action is to request an action selection from a human: request an identification of a new first action to be taken by the agent from a human (Figure 3A, Algorithm 1, Line 6) based on a first variance for a first state of the environment, wherein the first variance is based on a state of the environment before the first state (Figure 3A, when t ≥3, let the first state be the state when t≥3, then any variance determined in line 4 implicitly relies upon variances determined before it because “a” leads to s’ which becomes the new “s” in a subsequent iteration); and receive the identification of the new first action; and cause the new first action to be taken by the agent (Figure 3A, Algorithm 1, Line 10). Regarding claim 6, Da Silva teaches all of the limitations of claim 5, wherein the hardware processor is also configured to: select a second action to be taken by the agent based on the reinforcement learning policy (Figure 3A, Algorithm 1, line 8); determine that the second action is not to request an action selection from a human (Figure 3A, Algorithm 1, line 8); in response to determining that the second action is not to request an action selection from a human: cause the second action to be taken by the agent (Figure 3A, Algorithm 1, lines 8 and 10). Regarding claim 7, Da Silva teaches all of the limitations of claim 5, wherein the agent is one of an autonomous vehicle and a robot (¶29). Regarding claims 8-11, the system of Da Silva according to claims 1-4 performs the method of claims 8-11 under normal operation. Regarding claims 12-14, the system of Da Silva according to claims 5-7 performs the method of claims 12-14 under normal operation. Regarding claims 15-18, Da Silva according to claims 1-4 covers the instructions of claims 15-18. Regarding claims 19-21, Da Silva according to claims 5-7 covers the instructions of claims 19-21. Response to Arguments Applicant’s remarks filed 12/30/2025 have been fully considered. As shown herein, Da Silva can be interpreted to cover the newly amended limitations. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCHYLER S SANKS whose telephone number is (571)272-6125. The examiner can normally be reached 06:30 - 15:30 Central Time, M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SCHYLER S SANKS/Primary Examiner, Art Unit 2129
Read full office action

Prosecution Timeline

Jan 30, 2023
Application Filed
Jul 07, 2023
Response after Non-Final Action
Sep 26, 2025
Non-Final Rejection — §102
Dec 30, 2025
Response Filed
Jan 22, 2026
Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12602588
NEURAL NETWORK MODEL OPTIMIZATION METHOD BASED ON ANNEALING PROCESS FOR STAINLESS STEEL ULTRA-THIN STRIP
2y 5m to grant Granted Apr 14, 2026
Patent 12578694
INTELLIGENT MONITORING METHOD AND APPARATUS FOR ABNORMAL WORKING CONDITIONS IN HEAVY METAL WASTEWATER TREATMENT PROCESS BASED ON TRANSFER LEARNING AND STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
Patent 12578103
HUMIDIFIER FOR PREVENTING POLLUTION OF HUMIDIFYING WATER
2y 5m to grant Granted Mar 17, 2026
Patent 12571549
DESICCANT ENHANCED EVAPORATIVE COOLING SYSTEMS AND METHODS
2y 5m to grant Granted Mar 10, 2026
Patent 12553629
HEAT PUMP AND METHOD FOR INSTALLING THE SAME
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
72%
Grant Probability
88%
With Interview (+15.9%)
2y 11m
Median Time to Grant
Moderate
PTA Risk
Based on 501 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in for Full Analysis

Enter your email to receive a magic link. No password needed.

Free tier: 3 strategy analyses per month