Last updated: April 19, 2026

Application No. 17/799,332

Training an Artificial Intelligence Unit for an Automated Vehicle

Final Rejection §101§103

Filed

Aug 12, 2022

Examiner

HWANG, MEGAN ELIZABETH

Art Unit

2143

Tech Center

2100 — Computer Architecture & Software

Assignee

BAYERISCHE MOTOREN WERKE AKTIENGESELLSCHAFT

OA Round

2 (Final)

This examiner grants 47% of cases after interview

— +60.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 19 resolved cases, 2023–2026

Examiner Intelligence

HWANG, MEGAN ELIZABETH View full profile →

Grants 47% of resolved cases

Career Allow Rate

9 granted / 19 resolved

-7.6% vs TC avg

Strong +60% interview lift

Without

With

+60.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

25 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

34.9%

-5.1% vs TC avg

§103

41.0%

+1.0% vs TC avg

§102

7.4%

-32.6% vs TC avg

§112

15.3%

-24.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 19 resolved cases

Office Action

§101 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 11-14 and 16-20 are presented for examination. Claim 15 has been canceled. This Office Action is responsive to the amendment filed on 11/06/2025, which has been entered into the above identified application.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 11-14, 16-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hu et al. (US 20190266489 A1, filed 04/29/2019), hereinafter Hu; in view of Guckelsberger et al. (“Supportive and Antagonistic Behavior in Distributed Computational Creativity via Coupled Empowerment Maximisation”, published 07/01/2016), hereinafter Guckelsberger. Hu is referenced in IDS filed 08/12/2022.
Regarding Claim 11, Hu teaches A system for training an artificial intelligence unit for an automated vehicle (Hu: “The system for interaction-aware decision making may include a communication interface transmitting the multi-goal, multi-agent, multi-stage, interaction-aware decision making network policy to a server or a vehicle.” [0011]), comprising:
a processor; a memory in communication with the processor, the memory storing a plurality of instructions executable by the processor (Hu: “a system for interaction-aware decision making may include a processor, a memory, and a simulator implemented via the processor and memory.” [0008]) to cause the system to implement:
an artificial neural network configured to determine an evaluation value for at least two motion actions for the automated vehicle based on an input state and based on a knowledge configuration (KC) (Hu: “a neural network construction for multi-agent reinforcement learning (MARL) may be provided.” [0084]; “The traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action. Stated another way, the reward function may be utilized to measure success or failure.” [0170]; “The Q-masker 1114 may be implemented via a low-level controller and be part of a deep Q-learning system which learns policies which enable the autonomous vehicle to make decisions on a tactical level. The deep Q-learning system may learn a mapping between states and Q-values associated with each potential action.” [0163]),
wherein the input state characterizes the automated vehicle and at least one other road user (Hu: “the CM3 network policy may receive an input of an observation associated with the first autonomous vehicle or the second autonomous vehicle (e.g., a vehicle state or an environment state) and output a suggested action.” [0059]), wherein the memory further comprises instructions to cause the system to:
select one motion action from the at least two motion actions based on the evaluation value of the respective motion actions (Hu: “At each time step, the agent may receive an observation which may include the reward. The agent may select one action from a set of available actions, which results in a new state and a new reward for a subsequent time step. The goal of the agent is generally to collect the greatest amount of rewards possible.” [0060]); and
train the artificial neural network by adapting the knowledge configuration of the artificial neural network based on the selected motion action (Hu: “During the training of the first agent based on the first policy gradient and training the first critic based on the first loss function within the single-agent environment according to the MDP, the simulator 108 may train the first agent by enabling the first agent to select an action from a set of one or more actions.” [0209]); and 
control longitudinal guidance and lateral guidance of the automated vehicle based on the artificial neural network (Hu: “The set of possible actions may include a no-operation action, an acceleration action, a deceleration action, a brake release action, a shift left one sub-lane action, or a shift right one sub-lane action.” [0006]; In light of paragraph [0027] of the specification, which states “The at least two motion actions are in particular motion actions regarding longitudinal and/or lateral motion of the automated vehicle, for example acceleration, deceleration, turn left, turn right, switch lane to the left, stay in lane, or switch lane to the right”, BRI would support that “longitudinal guidance and lateral guidance” would encompass vehicular actions including accelerating, decelerating and lane switching).
	However, Hu fails to expressly disclose wherein the knowledge configuration characterizes at least an empowerment of the at least one other road user; and wherein a first motion action is determined to have a higher evaluation value than a second motion action when the first motion action provides the at least one other road user a higher number of possible future motion actions than the second motion action.
	In the same field of endeavor, Guckelsberger teaches wherein the knowledge configuration characterizes at least an empowerment of the at least one other road user (Guckelsberger: “Empowerment, the quantity underlying the CEM principle, is defined over the relationship between an agent’s actuators and sensors, and as such is sensitive to the agent’s embodiment and Umwelt. In a deterministic environment, empowerment quantifies an agent’s options in terms of availability and visibility. In a stochastic setting, this generalises to the potential influence of an agent’s actions on its environment, and to the extent to which it can perceive this influence afterwards.” [Section. Formal Model - Empowerment and Empowerment Maximisation]); and 
	wherein a first motion action is determined to have a higher evaluation value than a second motion action when the first motion action provides the at least one other road user a higher number of possible future motion actions than the second motion action (Guckelsberger: “For the supportive case and two agents, the active, first agent has to calculate the expected coupled empowerment of each of its actions at.” [Section. Formal Model - Coupled Empowerment Maximisation]; “Crucially, empowerment does not measure an agent’s actual, but their potential influence on the environment. The EM principle suggests that an agent should, in absence of any explicit goals, choose actions which are likely to lead to states with higher influence on the environment, i.e. more options.” [Section. Formal Model - Empowerment and Empowerment Maximisation]; “CEM suggests that the active agent chooses its actions in order to both maximise its own expected empowerment and to maximise or minimise the empowerment of the coupled agents. This is formalised by Eq. 1, the general action selection policy.” [Section. Formal Model - Coupled Empowerment Maximisation]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated wherein the knowledge configuration characterizes at least an empowerment of the at least one other road user; and wherein a first motion action is determined to have a higher evaluation value than a second motion action when the first motion action provides the at least one other road user a higher number of possible future motion actions than the second motion action, as taught by Guckelsberger to the system of Hu because both of these systems are directed towards a training autonomous agent action policies in a cooperative multi-agent environment. In making this combination and selecting actions based on the empowerment provided to another agent, it would allow the system of Hu to enable the emergence of supportive behavior “without putting explicit constraints on the types of interactions” (Guckelsberger: [Section. Introduction]).
Regarding Claim 20, it is a method claim (Hu: “According to one aspect, a method for interaction-aware decision making may include training a first agent based on a first policy gradient and training a first critic based on a first loss function to learn one or more goals in a single-agent environment, where the first agent is the only agent present, using a Markov decision process.” [0003]) that corresponds to the system of Claim 11. Therefore, it is rejected for the same reasons as Claim 11 above.
Regarding Claim 12, Hu and Guckelsberger teach the system of Claim 11, wherein the empowerment of the at least one other road user is at least characterized by a number of possible future motion actions of the at least one other road user (Guckelsberger: “Crucially, empowerment does not measure an agent’s actual, but their potential influence on the environment. The EM principle suggests that an agent should, in absence of any explicit goals, choose actions which are likely to lead to states with higher influence on the environment, i.e. more options.” [Section. Formal Model - Empowerment and Empowerment Maximisation]; “CEM suggests that the active agent chooses its actions in order to both maximise its own expected empowerment and to maximise or minimise the empowerment of the coupled agents. This is formalised by Eq. 1, the general action selection policy.” [Section. Formal Model - Coupled Empowerment Maximisation]).
Regarding Claim 13, Hu and Guckelsberger teach the system of Claim 11, wherein the knowledge configuration further characterizes a reward with respect to the automated vehicle reaching a goal (Hu: “The rewards provided by the reward function enables reinforcement learning to occur based on a given goal (e.g., reach an exit ramp).” [0170]).
Regarding Claim 14, Hu and Guckelsberger teach the system of Claim 11, wherein the knowledge configuration further characterizes a distance between the automated vehicle and the other road user (Hu: “the Q-masker 1114 may determine, based on the prior knowledge, the masked subset of actions to include an autonomous driving maneuver of accelerating when the autonomous vehicle is positioned a first threshold distance behind the other vehicle when both the autonomous vehicle and the other vehicle are positioned in the same lane and a.sup.n autonomous driving maneuver of decelerating when the autonomous vehicle is positioned a second threshold distance ahead of the other vehicle when both the autonomous vehicle and the other vehicle are positioned in the same lane.” [0183]).
Regarding Claim 16, Hu and Guckelsberger teach the system of Claim 11, wherein a future state of an environment of the automated vehicle is more predictable for the first motion action than for the second motion action (Hu: “The simulator 108 may set the environment to a next state st+1 due to the joint action at:{at1 … atN}, according to a transition probability P(St+1|st, a): SxA1x ... xANxS->[0,1]. Each agent may receive a reward R: SxAnxG->R and the learning task is to find stochastic policies πn(anlon, gn): OnxGxAn->[0,1], which condition only on local observations and goals, to maximize [Equation 1] over horizon T, where is y a discount factor.” [0069]).
Regarding Claim 17, Hu and Guckelsberger teach the system of Claim 11, where a probability of occurrence of a future state of an environment of the automated vehicle is higher when the automated vehicle would perform the first motion action than a probability of occurrence of a future state of an environment of the automated vehicle when the automated vehicle would perform the second motion action (Hu: “The simulator 108 may set the environment to a next state st+1 due to the joint action at:{at1 … atN}, according to a transition probability P(St+1|st, a): SxA1x ... xANxS->[0,1]. Each agent may receive a reward R: SxAnxG->R and the learning task is to find stochastic policies πn(anlon, gn): OnxGxAn->[0,1], which condition only on local observations and goals, to maximize [Equation 1] over horizon T, where is y a discount factor.” [0069]).
Regarding Claim 19, Hu and Guckelsberger teach the system of Claim 11, wherein the artificial intelligence unit is a reinforcement learning unit (Hu: “As used herein, “CM3” may refer to the use of a method for cooperative multi-goal, multi-agent, multi-stage reinforcement learning or a system for cooperative multi-goal, multi-agent, multi-stage reinforcement learning” [0046]).
Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Hu in view of Guckelsberger, as applied to Claim 11 above, in further view of Over et al. (“The probability of causal conditionals”; published February 2007), hereinafter Over. Over was cited in the previous Office Action.
Regarding Claim 18, Hu and Guckelsberger teach the system of Claim 11, wherein the artificial neural network is further configured to:
predict a future state of an environment of the automated vehicle for each of the motion actions for the automated vehicle, with the artificial neural network determining two probabilities of occurrence for each of the future states of the environment of the automated vehicle (Hu: “The simulator 108 may set the environment to a next state st+1 due to the joint action at:{at1 … atN}, according to a transition probability P(St+1|st, a): SxA1x ... xANxS->[0,1]. Each agent may receive a reward R: SxAnxG->R and the learning task is to find stochastic policies πn(anlon, gn): OnxGxAn->[0,1], which condition only on local observations and goals, to maximize [Equation 1] over horizon T, where is y a discount factor.” [0069]), wherein
a first probability of occurrence is a conditional probability given the occurrence of the respective motion action (Hu: “The simulator 108 may set the environment to a next state st+1 due to the joint action at:{at1 … atN}, according to a transition probability P(St+1|st, a): SxA1x ... xANxS->[0,1].” [0069]), and
a second probability is independent of the occurring of the respective motion action (Hu: “Each agent may receive a reward R: SxAnxG->R and the learning task is to find stochastic policies πn(anlon, gn): OnxGxAn->[0,1], which condition only on local observations and goals, to maximize [Equation 1] over horizon T, where is y a discount factor” [0069]), and
the artificial intelligence unit determines an evaluation value for at least two motion actions for the automated vehicle (Hu: “The traffic simulator 1112 may utilize a reward function (R) which may be a function that evaluates a taken (e.g., simulated) action. Stated another way, the reward function may be utilized to measure success or failure.” [0170]).
	However, Hu and Guckelsberger fail to expressly disclose wherein the artificial neural network determines an evaluation value such that a first motion action is determined a higher evaluation value than a second motion action when a difference of the two probabilities for the first motion action is higher than a difference of the two probabilities for the second motion action.
	In the same field of endeavor, Over teaches wherein the artificial neural network determines an evaluation value such that a first motion action is determined a higher evaluation value than a second motion action when a difference of the two probabilities for the first motion action is higher than a difference of the two probabilities for the second motion action (Over: “Delta-p rule: P(q|p) should be a positive predictor and P(q|¬p) an equally large negative predictor as their difference measures the degree of correlation between p and q.” [Section 2.2.3. Correlation and regression analyses]).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determines an evaluation value such that a first motion action is determined a higher evaluation value than a second motion action when a difference of the two probabilities for the first motion action is higher than a difference of the two probabilities for the second motion action, as taught by Over to the system of Hu and Guckelsberger because both of these systems are directed towards measuring the conditional probability of a state given an action for the purposes of decision making. In making this combination and measuring the difference between the conditional and independent probabilities of a state (q) given an action (p), it would allow the system of Hu and Guckelsberger to measure “the extent to which p raises the probability of q, or the degree of covariation between p and q” (Over: [Section 1.3. Correlation, causation, and the delta-p rule]).
Response to Arguments
The Examiner acknowledges the Applicant’s amendments to Claims 11 and 16-20.
Applicant’s arguments, filed 11/06/2025, with respect to the objections to the specification have been fully considered and are persuasive.  The objections have been withdrawn.
Applicant’s arguments, filed 11/06/2025, with respect to the rejection of Claims 11-20 under 35 U.S.C. § 101 have been fully considered and are persuasive.  The rejection has been withdrawn. 
Applicant’s arguments, filed 11/06/2025, with respect to the rejection of Claims 11-14 and 16-20 under 35 U.S.C. § 101 have been fully considered and are found moot in light of the new grounds of rejection (see rejection above).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Jaques et al. (“Intrinsic Social Motivation via Causal Influence in Multi-Agent RL”) discusses a multi-agent reinforcement learning system in which agents are rewarded for having causal influence over another agent’s actions.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEGAN E HWANG whose telephone number is (703)756-1377. The examiner can normally be reached Monday-Thursday 10:00-7:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.E.H./Examiner, Art Unit 2143                                                                                                                                                                                                        
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143

Read full office action

Prosecution Timeline

Aug 12, 2022

Application Filed

Aug 14, 2025

Non-Final Rejection — §101, §103

Nov 06, 2025

Response Filed

Feb 10, 2026

Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/446,509

Patent 12456093

Corporate Hierarchy Tagging

2y 5m to grant Granted Oct 28, 2025

17/521,057

Patent 12437514

VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING

2y 5m to grant Granted Oct 07, 2025

18/484,826

Patent 12437517

VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING

2y 5m to grant Granted Oct 07, 2025

18/484,832

Patent 12437518

VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING

2y 5m to grant Granted Oct 07, 2025

18/484,839

Patent 12437519

VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING

2y 5m to grant Granted Oct 07, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

47%

Grant Probability

99%

With Interview (+60.2%)

3y 0m

Median Time to Grant

Moderate

PTA Risk

Based on 19 resolved cases by this examiner. Grant probability derived from career allow rate.

Training an Artificial Intelligence Unit for an Automated Vehicle

This examiner grants 47% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email