Last updated: May 29, 2026

Application No. 18/455,056

REINFORCEMENT LEARNING (RL) POLICY WITH GUIDED META RL

Non-Final OA §101§103

Filed

Aug 24, 2023

Examiner

TSUI, WILSON W

Art Unit

2172

Tech Center

2100 — Computer Architecture & Software

Assignee

The Board Of Trustees Of The Leland Stanford Junior University

OA Round

3 (Non-Final)

This examiner grants 62% of cases after interview

— +57.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 598 resolved cases, 2023–2026

Examiner Intelligence

TSUI, WILSON W View full profile →

Grants 62% of resolved cases

Career Allowance Rate

368 granted / 598 resolved

+6.5% vs TC avg

Strong +58% interview lift

Without

With

+57.8%

Interview Lift

resolved cases with interview

Typical timeline

4y 0m

Avg Prosecution

22 currently pending

Career history

640

Total Applications

across all art units

Statute-Specific Performance

§101

2.6%

-37.4% vs TC avg

§103

89.7%

+49.7% vs TC avg

§102

3.6%

-36.4% vs TC avg

§112

3.7%

-36.3% vs TC avg

Black line = Tech Center average estimate • Based on career data from 598 resolved cases

Office Action

§101 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
With regards to claims 1, 2, 4-9, 11-12 and 14-20, they remain rejected under 35 U.S.C. § 101. 
The following rejections are withdrawn in view of applicant’s amendments:
Claim(s) 1, 2, 4, 6-8, 10-12, 14, 16-18, and 20 rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Learning to Navigate Intersections with Unsupervised Driver Trait Inference”, published Mar. 2020, pages  1-7) in view of Bouton et al (US Application: US 20210271988, published: Sep. 2, 2021, filed: Jul. 28, 2020).
Claim(s) 5, 9, 15 and 19 rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (“Learning to Navigate Intersections with Unsupervised Driver Trait Inference”, published Mar. 2020, pages 1-7, already included in IDS filed 10/31/2023, and copy of this reference is attached to this office action with annotated page numbering) in view of Bouton et al (US Application: US 20210271988, published: Sep. 2, 2021, filed: Jul. 28, 2020) in view of Jain et al (“Generalization to New Actions in Reinforcement Learning”, pages 1-23, published: Nov 2020) .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/20/2026 has been entered.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/28/2026 is being considered by the examiner.
 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 4-9, 11-12 and 14-20 remain rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 

Claim 1:
Step 1:
	Claim 1 falls within a statutory category .

Step 2A, Prong One:
With regards to claim 1, the claim recites the following of which the limitations that are bolded recite a judicial exception for a mental process or a mathematical concept:
A system for generating a reinforcement learning (RL) policy with guided meta RL, comprising: a memory storing one or more instructions; a processor executing one or more of the instructions stored on the memory to perform: generating an initial RL policy for an ego-vehicle; generating a set of RL guiding policies for a set of social agents based on the initial RL policy, a reward function including a social agent base reward and a social agent final reward, and a set of preferences, wherein the social agent final reward is equal to a sum of the social agent base reward and an ego-vehicle base reward, weighted by the set of preferences, wherein each social agent has a respective preference being indicative of one of multiple aggressiveness levels with respect to the ego-vehicle; generating a meta-RL guided policy based on the set of RL guiding policies; and generating a RL policy with guided meta RL for the ego-vehicle based on the meta-RL guided policy, and operating one or more vehicle systems of the ego-vehicle according to the RL policy with guided meta RL.

	Of note, the examiner first points out in MPEP 2106.04(a)(2)III, the following citations: 
“the "mental processes" abstract idea grouping is defined as concepts performed in the human mind, and examples of mental processes include observations, evaluations, judgments, and opinions.” 
“The courts do not distinguish between mental processes that are performed entirely in the human mind and mental processes that require a human to use a physical aid (e.g., pen and paper or a slide rule) to perform the claim limitation. See, e.g., Benson, 409 U.S. at 67, 65, 175 USPQ at 674-75, 674 (noting that the claimed "conversion of [binary-coded decimal] numerals to pure binary numerals can be done mentally," i.e., "as a person would do it by head and hand."); Synopsys, Inc. v. Mentor Graphics Corp., 839 F.3d 1138, 1139, 120 USPQ2d 1473, 1474 (Fed. Cir. 2016) (holding that claims to a mental process of "translating a functional description of a logic circuit into a hardware component description of the logic circuit" are directed to an abstract idea, because the claims "read on an individual performing the claimed steps mentally or with pencil and paper").

Accordingly, with respect to the above bolded limitations for “generating a reinforcement learning (RL) policy with guided meta RL, comprising … generating an initial RL policy  …; generating a set of RL guiding policies for a set of social agents based on the initial RL policy and a set of preferences, each social agent having a respective preference being indicative of one of multiple aggressiveness levels with respect to the ego-vehicle; generating a meta-RL guided policy based on the set of RL guiding policies; and generating a RL policy with guided meta RL for the ego-vehicle based on the meta-RL guided policy”, these limitations are directed to a mental process or mathematical concept because of at least:
A person can mentally make a judgement to: write down a reinforcement policy. 
A person can mentally make a judgment to write down an initial RL policy. 
A person can mentally Evaluate the initial RL policy and evaluate a set of preferences (along with evaluating social agent preference data of aggressiveness level(s) with respect to the ego vehicle), and make a judgement to write down a set of RL guiding policies based on the evaluation. 
The limitation of ‘wherein the social agent final reward is equal to a sum of the social agent base reward and an ego-vehicle base reward, weighted by the set of preferences’ recites a mathematical formula for calculating a social agent final reward.
A person can mentally evaluate the set of RL guiding policies and make a judgement to write down a meta-RL guided policy based on the evaluation.
A person can mentally evaluate the meta-RL guided policy and make a judgement to write down a RL policy based on the evaluation.   

Step 2A, Prong Two
The claim recites the following additional elements:
“A system for … generating … a memory storing one or more instructions; a processor executing one or more of the instructions stored on the memory to perform …”, “… operating one or more vehicle systems of the ego-vehicle according to the RL policy with guided meta RL”. These additional element(s) is/are considered merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer (which can also be a computer system of a vehicle), or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f). The courts have identified this type of limitation to be insufficient to integrate a judicial exception into a practical application.
“for an ego vehicle …”. This additional element is considered generally linking the use of a judicial exception to a particular technological environment or field of use, as discussed in MPEP § 2106.05(h). The courts have identified this type of limitation to be insufficient to integrate a judicial exception into a practical application.

Step 2B:
As discussed in step 2A, prong two, there are additional elements of:
“A system for … generating … a memory storing one or more instructions; a processor executing one or more of the instructions stored on the memory to perform …” and “… operating one or more vehicle systems of the ego-vehicle according to the RL policy with guided meta RL”. These additional element(s) is/are considered merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer (which can also be a computer system of a vehicle), or merely using a computer as a tool to perform an abstract idea. The courts have found this type of limitation to be insufficient to be ‘significantly more’ when recited in a claim with a judicial exception. (see also Alice Corp., 573 U.S. at 225-26, 110 USPQ2d at 1984 ).
“for an ego vehicle …”. This additional element is considered generally linking the use of a judicial exception to a particular technological environment or field of use. The courts have found this type of limitation to be insufficient to be ‘significantly more’ when recited in a claim with a judicial exception (see also Bilski v. Kappos, 561 U.S. 593, 595, 95 USPQ2d 1001, 1010 (2010) or a claim limiting the use of a mathematical formula to the petrochemical and oil-refining fields, as discussed in Parker v. Flook, 437 U.S. 584, 588-90, 198 USPQ 193, 197-98 (1978) (MPEP § 2106.05(h).

Even when considered in combination, these additional elements represent mere instructions to apply an exception and insignificant extra-solution activity, and therefore do not provide an inventive concept.

Claims 2 and 4-9: 
	These claims (2 and 4-9) recite further operations directed to mental process(es) and /or mathematical process(es) and do not recite additional elements that would result in integrated their corresponding judicial exception(s) into a practical application nor do they contain additional elements that would amount to significantly more than their corresponding recited exception(s).

Claim 11
With regards to claim 8 , it is rejected under similar rationale as claim 1 (as it is broader than claim 1).

Claims 12 and 14-19
These claims (12 and 14-19) recite further operations directed to mental process(es) and /or mathematical process(es) and do not recite additional elements that would result in integrated their corresponding judicial exception(s) into a practical application nor do they contain additional elements that would amount to significantly more than their corresponding recited exception(s).

Claim 20

With regards to claim 20, it is rejected under similar rationale as claim 1. It is noted that it does additionally recite a “controller” and “vehicle systems” , however just as similarly explained in the rejection of claim 1, this is considered 1. merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, and 2. generally linking the use of a judicial exception to a particular technological environment or field of use, respectively. The courts have identified these types of limitations to be insufficient to integrate a judicial exception into a practical application and also the courts have found these types of limitations to be insufficient to be ‘significantly more’ when recited in a claim with a judicial exception.
 
Response to Arguments
Applicant's arguments filed 04/20/2056 have been fully considered but they are not persuasive.
With regards to claim 1, 11 and 20 and their corresponding 35 USC § 101 rejections, the applicant argues with respect to the amendments and step 2A, Prong One, that a person cannot mentally make a judgment to write down this reinforcement policy realistically in real-time [since] it would be time intensive/difficult without a processor. However this argument is not persuasive and the examiner respectfully points out the applicant arguing aspects of ‘real time’ , but the claim language does not require tie in any aspect of ‘real time’ abilities and rather only recites judicial exceptions that merely apply the judicial exceptions to a computer (processor). The examiner maintains with respect to the argued steps of determining an RL policy and calculating a final reward as being mental steps and/or mathematical concepts. Furthermore, the examiner points out that the Federal Circuit has explained, "[c]ourts have examined claims that required the use of a computer and still found that the underlying, patent-ineligible invention could be performed via pen and paper or in a person’s mind." Versata Dev. Group v. SAP Am., Inc., 793 F.3d 1306, 1335, 115 USPQ2d 1681, 1702 (Fed. Cir. 2015). See also Intellectual Ventures I LLC v. Symantec Corp., 838 F.3d 1307, 1318, 120 USPQ2d 1353, 1360 (Fed. Cir. 2016) (‘‘[W]ith the exception of generic computer-implemented steps, there is nothing in the claims themselves that foreclose them from being performed by a human, mentally or with pen and paper.’’); Mortgage Grader, Inc. v. First Choice Loan Servs. Inc., 811 F.3d 1314, 1324, 117 USPQ2d 1693, 1699 (Fed. Cir. 2016) (holding that computer-implemented method for "anonymous loan shopping" was an abstract idea because it could be "performed by humans without a computer")

The applicant argues in Step 2A, Prong Two, the claims (1, 11 and 20) that the claims integrates a recited judicial exception into a practical application because the claimed invention improves the function of another technology or technical field. However this argument is not persuasive since the claims have been explained to recite a judicial exception(s) and the additional elements are only nominally referenced/applied. Furthermore, operating a vehicle system can be interpreted as using a computer of a vehicle system to perform a judicial exception and the claim does not make any additional steps for how the vehicle is improved just through the claimed ‘operating’. As explained in the rejection for claims 1, 11 and 20 above, the processor and operation of vehicle systems are considered merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer (which can also be a computer system of a vehicle), or merely using a computer as a tool to perform an abstract idea. The courts have found this type of limitation to be insufficient to be ‘significantly more’ when recited in a claim with a judicial exception. (see also Alice Corp., 573 U.S. at 225-26, 110 USPQ2d at 1984 ).
The applicant further argues new policies are generated for social agents … to enhance the robustness of ego policies through rewards (including a social agent base reward and a social agent final reward), and thus improve in the functioning of a computer or the technological field of autonomous driving by providing improved policies … with guided meta RL. However this argument is not persuasive since those policies are only tied to the additional elements of a ‘processor’ and ‘vehicle system’ (which can be a vehicle computer) and thus merely applying a computer to run/operate a judicial exception is not sufficient to integrate the judicial exception into a practical applicant and only requires executing/operation of computer(s)/processor(s)/vehicle system (computer). 
The applicant argues with regards to step 2B for claims 1, 11 and 20 about ‘extra solution activity’ in response to the rejection. However this argument is not persuasive since the rejection never mentioned extra solution activity. As explained above, the additional element(s) are considered merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer (which can also be a computer system of a vehicle), or merely using a computer as a tool to perform an abstract idea. The courts have found this type of limitation to be insufficient to be ‘significantly more’ when recited in a claim with a judicial exception. (see also Alice Corp., 573 U.S. at 225-26, 110 USPQ2d at 1984 ). Additionally, the additional element(s) that generally linking the use of a judicial exception to a particular technological environment or field of use have been found by the courts to be insufficient to be ‘significantly more’ when recited in a claim with a judicial exception (see also Bilski v. Kappos, 561 U.S. 593, 595, 95 USPQ2d 1001, 1010 (2010) or a claim limiting the use of a mathematical formula to the petrochemical and oil-refining fields, as discussed in Parker v. Flook, 437 U.S. 584, 588-90, 198 USPQ 193, 197-98 (1978) (MPEP § 2106.05(h).
With regards to the remaining claims that depend directly or indirectly upon claims 1, 11 and 20, the applicant argues they are subject matter eligible for at least the reasons provided by the applicant for claims 1, 11 and 20. However this argument is not persuasive since claims 1, 11 and 20 have been shown/explained to be rejected above.

Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wray et al (US Application: US 20190329771): This reference teaches using partially observable stochastic game model having a reward function of the autonomous vehicle and an additional reward function for each external object.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILSON W TSUI whose telephone number is (571)272-7596. The examiner can normally be reached Monday - Friday 9 am -6 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Adam Queler can be reached at (571) 272-4140. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/WILSON W TSUI/Primary Examiner, Art Unit 2172

Read full office action

Prosecution Timeline

Aug 24, 2023

Application Filed

Jun 18, 2025

Non-Final Rejection mailed — §101, §103

Sep 15, 2025

Response Filed

Jan 21, 2026

Final Rejection mailed — §101, §103

Mar 19, 2026

Response after Non-Final Action

Apr 20, 2026

Request for Continued Examination

Apr 24, 2026

Response after Non-Final Action

May 20, 2026

Non-Final Rejection mailed — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/039,891

Patent 12626065

LIFECYCLE MANAGEMENT FOR CUSTOMIZED NATURAL LANGUAGE PROCESSING

5y 7m to grant Granted May 12, 2026

18/763,951

Patent 12602535

COMMENT DISPLAY METHOD AND APPARATUS OF A DOCUMENT, AND DEVICE AND MEDIUM

1y 9m to grant Granted Apr 14, 2026

18/088,971

Patent 12589766

AUTONOMOUS DRIVING SYSTEM AND METHOD OF CONTROLLING SAME

3y 3m to grant Granted Mar 31, 2026

18/041,489

Patent 12570284

AUTONOMOUS DRIVING METHOD AND DEVICE FOR A MOTORIZED LAND VEHICLE

3y 0m to grant Granted Mar 10, 2026

18/101,216

Patent 12552376

VEHICLE CONTROL APPARATUS

3y 0m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

62%

Grant Probability

99%

With Interview (+57.8%)

4y 0m (~1y 2m remaining)

Median Time to Grant

High

PTA Risk

Based on 598 resolved cases by this examiner. Grant probability derived from career allowance rate.