Last updated: April 17, 2026

Application No. 18/103,426

Systems, Methods, and Media for Selecting Actions to be Taken By a Reinforcement Learning Agents

Final Rejection §102

Filed

Jan 30, 2023

Examiner

SANKS, SCHYLER S

Art Unit

2129

Tech Center

2100 — Computer Architecture & Software

Assignee

unknown

OA Round

2 (Final)

Interview Optional

— +15.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 501 resolved cases, 2023–2026

Examiner Intelligence

SANKS, SCHYLER S View full profile →

Grants 72% — above average

Career Allow Rate

362 granted / 501 resolved

+17.3% vs TC avg

Strong +16% interview lift

Without

With

+15.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

40 currently pending

Career history

541

Total Applications

across all art units

Statute-Specific Performance

§101

2.6%

-37.4% vs TC avg

§103

46.7%

+6.7% vs TC avg

§102

17.1%

-22.9% vs TC avg

§112

32.2%

-7.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 501 resolved cases

Office Action

§102

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-21 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Da Silva (US20210073912A1).
Regarding claim 1, Da Silva teaches a system for selecting an action to be taken by a reinforcement learning agent in an environment (Fig. 3A, Algorithm 1), comprising a memory (Claim 1) and a hardware processor coupled to the memory and configured to at least (Claim 1): 
determine a first variance for a first state of the environment (Fig. 3A, Algorithm 1, line 4) wherein the first variance is based on a state of the environment before the first state (Figure 3A, when t ≥3, let the first state be the state when t≥3, then any variance determined in line 4 implicitly relies upon variances determined before it because “a” leads to s’ which becomes the new “s” in a subsequent iteration); 
determine that the first variance meets a threshold (Fig. 3A, Algorithm 1, line 5); 
in response to determining that the first variance meets the threshold: request an identification of a first action to be taken by the agent from a human (Fig. 3A, Algorithm 1, line 6, see ¶31 and ¶41, “A demonstrator 30 may also be a human”); and 
receive the identification of the first action (Fig. 3A, Algorithm 1, line 6); and 
cause the first action to be taken by the agent (Fig. 3A, Algorithm 1, line 10).
Regarding claim 2, Da Silva teaches all of the limitations of claim 1, wherein
the hardware processor is also configured to: 
determine a second variance for a second state of the environment, wherein the second variance is based wherein the second variance is based on a state of the environment before the first state (Figure 3A, when t ≥3, let the first state be the state when t≥3, then any variance determined in line 4 implicitly relies upon variances determined before it because “a” leads to s’ which becomes the new “s” in a subsequent iteration. Under the same analysis, the second variance can be when t is one less than that for the first variance); 
determine that the second variance does not meet the threshold (Figure 3A, Algorithm 1, line 7); 
in response to determining that the second variance does not meet the threshold: select a second action to be taken by the agent based on a reinforcement learning policy; and cause the second action to be taken by the agent (Figure 3A, Algorithm 1, lines 8 and 10).
Regarding claim 3, Da Silva teaches all of the limitations of claim 1, wherein the agent is an autonomous vehicle (¶29).
Regarding claim 4, Da Silva teaches all of the limitations of claim 1, wherein the agent is a robot (¶29).
Regarding claim 5, Da Silva teaches system for selecting an action to be taken by a reinforcement learning agent in an environment (Fig. 3A, Algorithm 1), comprising a memory (Claim 1) and a hardware processor coupled to the memory and configured to at least (Claim 1):
select a first action to be taken by the agent based on a reinforcement learning policy (Figure 3A, Algorithm 1, Line 6); 
determine that the first action is to request an action selection from a human (Figure 3A, Algorithm 1, Line 6); 
in response to determining that the first action is to request an action selection from a human: request an identification of a new first action to be taken by the agent from a human (Figure 3A, Algorithm 1, Line 6) based on a first variance for a first state of the environment, wherein the first variance is based on a state of the environment before the first state (Figure 3A, when t ≥3, let the first state be the state when t≥3, then any variance determined in line 4 implicitly relies upon variances determined before it because “a” leads to s’ which becomes the new “s” in a subsequent iteration); and 
receive the identification of the new first action; and cause the new first action to be taken by the agent (Figure 3A, Algorithm 1, Line 10).
Regarding claim 6, Da Silva teaches all of the limitations of claim 5, wherein the hardware processor is also configured to: 
select a second action to be taken by the agent based on the reinforcement learning policy (Figure 3A, Algorithm 1, line 8); 
determine that the second action is not to request an action selection from a human (Figure 3A, Algorithm 1, line 8); 
in response to determining that the second action is not to request an action selection from a human: cause the second action to be taken by the agent (Figure 3A, Algorithm 1, lines 8 and 10).
Regarding claim 7, Da Silva teaches all of the limitations of claim 5, wherein
the agent is one of an autonomous vehicle and a robot (¶29).
Regarding claims 8-11, the system of Da Silva according to claims 1-4 performs the method of claims 8-11 under normal operation.
Regarding claims 12-14, the system of Da Silva according to claims 5-7 performs the method of claims 12-14 under normal operation.
Regarding claims 15-18, Da Silva according to claims 1-4 covers the instructions of claims 15-18.
Regarding claims 19-21, Da Silva according to claims 5-7 covers the instructions of claims 19-21.
Response to Arguments
Applicant’s remarks filed 12/30/2025 have been fully considered.
As shown herein, Da Silva can be interpreted to cover the newly amended limitations.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCHYLER S SANKS whose telephone number is (571)272-6125. The examiner can normally be reached 06:30 - 15:30 Central Time, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SCHYLER S SANKS/Primary Examiner, Art Unit 2129

Read full office action

Prosecution Timeline

Jan 30, 2023

Application Filed

Jul 07, 2023

Response after Non-Final Action

Sep 26, 2025

Non-Final Rejection — §102

Dec 30, 2025

Response Filed

Jan 22, 2026

Final Rejection — §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/162,618

Patent 12602588

NEURAL NETWORK MODEL OPTIMIZATION METHOD BASED ON ANNEALING PROCESS FOR STAINLESS STEEL ULTRA-THIN STRIP

2y 5m to grant Granted Apr 14, 2026

17/908,922

Patent 12578694

INTELLIGENT MONITORING METHOD AND APPARATUS FOR ABNORMAL WORKING CONDITIONS IN HEAVY METAL WASTEWATER TREATMENT PROCESS BASED ON TRANSFER LEARNING AND STORAGE MEDIUM

2y 5m to grant Granted Mar 17, 2026

18/241,568

Patent 12578103

HUMIDIFIER FOR PREVENTING POLLUTION OF HUMIDIFYING WATER

2y 5m to grant Granted Mar 17, 2026

18/481,032

Patent 12571549

DESICCANT ENHANCED EVAPORATIVE COOLING SYSTEMS AND METHODS

2y 5m to grant Granted Mar 10, 2026

17/760,646

Patent 12553629

HEAT PUMP AND METHOD FOR INSTALLING THE SAME

2y 5m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

72%

Grant Probability

88%

With Interview (+15.9%)

2y 11m

Median Time to Grant

Moderate

PTA Risk

Based on 501 resolved cases by this examiner. Grant probability derived from career allow rate.