Last updated: April 19, 2026
Application No. 18/828,328
METHOD AND COMPUTER SYSTEM FOR MULTI-LEVEL CONTROL OF MOTION ACTUATORS IN AN AUTONOMOUS VEHICLE

Non-Final OA §103§112
Filed
Sep 09, 2024
Examiner
SHUDY, ANGELINA M
Art Unit
3668
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Volvo Truck Corporation
OA Round
1 (Non-Final)
Interview Optional

— +9.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 455 resolved cases, 2023–2026
Examiner Intelligence

SHUDY, ANGELINA M View full profile →
Grants 77% — above average
Career Allow Rate
349 granted / 455 resolved
+24.7% vs TC avg
Moderate +9% lift
Without
With
+9.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
30 currently pending
Career history
485
Total Applications
across all art units
Statute-Specific Performance

§101
15.8%
-24.2% vs TC avg
§103
35.2%
-4.8% vs TC avg
§102
13.3%
-26.7% vs TC avg
§112
27.4%
-12.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 455 resolved cases
Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 8 is objected to because of the following informalities: "wherein the RL agent is trained to perform decision-making based on a state of the vehicle and/or of vehicles surrounding the vehicle which is not included in the vehicle’s actual motion state sensed by the feedback controller" would be better understood as  "wherein the RL agent is trained to perform decision-making based on a state of the vehicle and/or of vehicles surrounding the vehicle which is not included in the of the vehicle sensed by the feedback controller".  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 7, 11, 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 7, 20 recite the limitation the second feedback controller includes a lane-change assistant and the setpoint motion state of the lateral motion actuator is a setpoint lane; and the RL agent is trained to perform joint decision-making regarding the setpoint TTC and regarding the setpoint lane. There is insufficient antecedent basis for the limitation “the second feedback controller” because claims 7, 20 depend upon claims 6 and 19, respectively.
Claim 11 recites the limitation “A computer program product comprising program code for performing, when executed by processing circuitry of a computer system, the method of claim 1”; however, a computer program product that comprises the code appears unclear regarding whether the product may include transitory forms of signal transmission (often referred to as "signals per se"), such as a propagating electrical or electromagnetic signal or carrier wave. The claim limitation may be better understood as A computer program product comprising program code, stored on a non-transitory computer-readable medium, for performing, when executed by processing circuitry of a computer system, the method of claim 1. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4, 9, 11-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over WO 2021213616 (“Hoel”) in view of Mukadam, Mustafa, et al. "Tactical decision making for lane changing with deep reinforcement learning." (2017) (“Mukadam”).
As per claim(s) 1, 13, Hoel discloses a computer-implemented method of controlling at least one motion actuator in an autonomous or semi-autonomous vehicle, comprising: 
providing processing circuitry with a control interface configured to sense an actual motion state of the vehicle (see at least [0034]: arrangement 200 includes processing circuitry 210, a memory 212 and a vehicle control interface 214…vehicle control interface 214 may receive signals from physical sensors (not shown) in the vehicle so as to detect current conditions of the driving environment or internal states prevailing in the vehicle 299) and 
determine a machine-level instruction to the motion actuator for approaching or maintaining a setpoint motion state (see at least [0034]: processing circuitry 210, a memory 212 and a vehicle control interface 214. The vehicle control interface 214 is configured to control the autonomous vehicle 299 by transmitting wired or wireless signals, directly or via intermediary components, to actuators (not shown) in the vehicle, claim 1: vehicle control (116), wherein the at least one tentative decision is executed in dependence of the estimated uncertainty, claim 13: different initial value and yielding a state-action value function Q (s, a) dependent on state and action…a vehicle control interface (214) configured to control the autonomous vehicle by executing the at least one tentative decision in dependence of the estimated uncertainty); 
providing a reinforcement-learning (RL) agent trained to perform decision-making regarding the setpoint motion state (see at least [0008]: tentative decision to perform action d in state s can be represented as state-action pair (s, d)…uncertainty estimation, which is performed on the basis of a variability measure for the K state-action value functions evaluated for the state-action pair (ŝ, â), [0034]: processing circuitry 210 implements an RL agent 220, [0035]: RL agent 220 interacts with an environment including the autonomous vehicle in a plurality of training sessions, each training session having a different initial value and yielding a state-action value function dependent on state and action. The RL agent 220 then outputs at least one tentative decision relating to control of the autonomous vehicle, [0036]: uncertainty estimator 222 is configured to estimate an uncertainty on the basis of a variability measure for the plurality of state-action value functions evaluated for a state-action pair corresponding to each of the tentative decisions by the RL agent); 
applying decisions by the RL agent as the setpoint motion state of the feedback controller (see at least claim 1: vehicle control (116), wherein the at least one tentative decision is executed in dependence of the estimated uncertainty); and 
applying the machine-level instruction to the motion actuator (see at least [0034]: processing circuitry 210, a memory 212 and a vehicle control interface 214. The vehicle control interface 214 is configured to control the autonomous vehicle 299 by transmitting wired or wireless signals, directly or via intermediary components, to actuators (not shown) in the vehicle, claim 1: vehicle control (116), wherein the at least one tentative decision is executed in dependence of the estimated uncertainty, claim 13: a vehicle control interface (214) configured to control the autonomous vehicle by executing the at least one tentative decision in dependence of the estimated uncertainty); 
wherein at least one of the steps of the method is performed using processing circuity of a computer system (see at least [0034]: processing circuitry 210, a memory 212 and a vehicle control interface 214. The vehicle control interface 214 is configured to control the autonomous vehicle 299 by transmitting wired or wireless signals, directly or via intermediary components, to actuators (not shown) in the vehicle). 
Hoel does not explicitly disclose a feedback controller separate from the RL agent.
However, Mukadam teaches a feedback controller separate from the RL agent, providing a feedback controller configured to determine a machine-level instruction to the motion actuator for approaching or maintaining a setpoint motion state (see at least abstract: reinforcement learning (RL) can be used to create a tactical decision-making agent for autonomous driving, page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes); 
applying decisions by the RL agent as the setpoint motion state of the feedback controller (see at least page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, Figure 2(a): Low-level module including at least a low-level controller, Action, deep Q network and the low-level module that interface together using Q-masking).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results.

As per claim(s) 2, 15, Hoel discloses wherein the feedback setpoint motion state is represented as a continuous variable (see at least [0014]: estimated uncertainty into a binary variable, other embodiments may treat the estimated uncertainty as a continuous variable, which may guide the quantity of additional safety measures necessary to achieve a desired safety standard, e.g., a maximum speed or traffic density at which the tentative decision shall be considered safe to execute). 
Hoel does not explicitly disclose a feedback controller separate from the RL agent.
However, Mukadam teaches a feedback controller separate from the RL agent, wherein the feedback setpoint motion state is represented as a continuous variable (see at least page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, page 5: a rule based time to collision (TTC) method [24] (we set the threshold as 10s) that checks for collisions given the state against all actions and masks off those actions that lead to collision), Figure 2(a): Low-level module including at least a low-level controller, Action, deep Q network and the low-level module that interface together using Q-masking).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results.

As per claim(s) 3, 16, Hoel discloses wherein the RL agent is trained to perform tactical decision-making regarding the setpoint motion state (see at least claim 5: wherein the decision-making includes tactical decision-making). 
Hoel does not explicitly disclose a feedback controller separate from the RL agent.
However, Mukadam teaches a feedback controller separate from the RL agent, wherein the RL agent is trained to perform tactical decision-making regarding the setpoint motion state (see at least abstract: reinforcement learning (RL) can be used to create a tactical decision-making agent for autonomous driving, page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, page 5: a rule based time to collision (TTC) method [24] (we set the threshold as 10s) that checks for collisions given the state against all actions and masks off those actions that lead to collision).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results.

As per claim(s) 4, 17, Hoel discloses wherein the controller is configured to control at least one longitudinal motion actuator (see at least [0041]: an adaptive cruise controller for the longitudinal motion and a lane-change model that makes tactical decisions to overtake slower vehicles). 
Hoel does not explicitly disclose a feedback controller separate from the RL agent.
However, Mukadam teaches a feedback controller separate from the RL agent, wherein the feedback controller is configured to control at least one longitudinal motion (see at least page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, page 5: a rule based time to collision (TTC) method [24] (we set the threshold as 10s) that checks for collisions given the state against all actions and masks off those actions that lead to collision, Figure 2(a): Low-level module including at least a low-level controller, Action, deep Q network and the low-level module that interface together using Q-masking).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results.

As per claim(s) 9, Hoel discloses wherein the RL agent is configured with one of the following: deep Q network (DQN); advantage actor critic (A2C); proximal policy optimization (PPO) (see at least claim 8: wherein the RL agent is a Q-learning agent, such as a deep Q network, DQN). 

As per claim(s) 11, Hoel discloses a computer program product comprising program code for performing, when executed by processing circuitry of a computer system, the method of claim 1 (see at least [0019]: computer…processing circuitry and memory). 

As per claim(s) 12, Hoel discloses a non-transitory computer-readable storage medium comprising instructions, which when executed by processing circuitry of a computer system, cause the processing circuitry to perform, the method of claim 1 (see at least [0019]: computer…processing circuitry and memory). 

As per claim(s) 14, Hoel discloses a vehicle comprising the computer system of claim 13 (see at least [0019]: an arrangement for controlling an autonomous vehicle, [0033]: arrangement 200 maybe provided, at last partially, in the autonomous vehicle 299). 

Claim(s) 6, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hoel in view of Mukadam, and further in view of Das, Lokesh Chandra, and Myounggyu Won. "Saint-acc: Safety-aware intelligent adaptive cruise control for autonomous vehicles using deep reinforcement learning." International Conference on Machine Learning. PMLR, 2021. (“Das”).
As per claim(s) 6, 19, Hoel discloses an adaptive cruise controller (ACC) and a longitudinal motion actuator (see at least [0041]: adaptive cruise controller for the longitudinal motion). 
Hoel does not explicitly disclose wherein the feedback controller includes an adaptive cruise controller (ACC) and the setpoint motion state of the longitudinal motion actuator is a setpoint time-to-collision (TTC).
However, Mukadam teaches a feedback controller, wherein the setpoint motion state of the longitudinal motion actuator is a setpoint time-to-collision (TTC) (see at least page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, page 5: a rule based time to collision (TTC) method [24] (we set the threshold as 10s) that checks for collisions given the state against all actions and masks off those actions that lead to collision, Figure 2(a): Low-level module including at least a low-level controller, Action, deep Q network and the low-level module that interface together using Q-masking).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results.
However, Das teaches wherein the feedback controller includes an adaptive cruise controller (ACC) (see at least abstract: adaptive cruise control (ACC) system, page 1 section 1: novel dual reinforcement learning (RL) agent approach…separate RL agent is designed to find and adapt the optimal TTC threshold (Gettman & Head, 2003) based on rich traffic information including both macroscopic and microscopic traffic data obtained from the surrounding environment).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Das with a reasonable expectation of success in order to improve traffic efficiency and driving safety (see at least Das page 1). The combination would yield predictable results.

Claim(s) 7, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hoel in view of Mukadam and Das, and further in view of US 20210291826 (“Benosman”).
As per claim(s) 7, 20, Hoel disclose a lane-change model (see at least [0041]). 
Hoel does not explicitly disclose the second feedback controller includes a lane-change assistant and the setpoint motion state of the lateral motion actuator is a setpoint lane; and the RL agent is trained to perform joint decision-making regarding the setpoint TTC and regarding the setpoint lane. 
However, Mukadam teaches the feedback controller includes a lane-change assistant and the setpoint motion state of the lateral motion actuator is a setpoint lane; the RL agent is trained to perform joint decision-making regarding the setpoint TTC and regarding the setpoint lane (see at least page 1: generate a steering command with adaptive control, page 3: high-level tactical decision making strategy such that the ego car makes efficient lane change maneuvers while relying on the low-level controller for collision free lane changing between adjacent lanes, page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, page 5: a rule based time to collision (TTC) method [24] (we set the threshold as 10s) that checks for collisions given the state against all actions and masks off those actions that lead to collision). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results. Further, the second feedback controller appears to be duplication of parts and does not appear to produce a new and unexpected results because the low-level controller of Mukadam includes for collision free lane-changing. In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960).
However, Benosman teaches the second feedback controller includes a lane keeping assistant and the motion state of the lateral motion actuator is a setpoint lane (see at least [0111]: controller 1402 outputs control commands to the controllers 1416 and 1418 to control the kinematic state of the vehicle. In some embodiments, the controllers 1414…a lane keeping controller 1420 that further process the control commands of the controller 1402… the controllers 1414 utilize the output of the controller 1402 i.e. control commands to control at least one actuator of the vehicle, such as the steering wheel and/or the brakes of the vehicle, in order to control the motion of the vehicle, claim 1: update the closure model using reinforcement learning (RL) having a value function reducing a difference between a shape of the received state trajectory and a shape of state trajectory estimated using the model with the updated closure model; and determine a control command based on the model with the updated closure model; and an output interface configured to transmit the control command to an actuator of the system to control the operation of the system, claim 13: vehicle controlled to perform one or combination of a lane keeping, a cruise control, and an obstacle avoidance operation, wherein the state of the vehicle includes one or combination of a position, an orientation, and a longitudinal velocity, and a lateral velocity of the vehicle, wherein the control inputs include one or combination of a lateral acceleration, a longitudinal acceleration, a steering angle, an engine torque, and a brake torque, state constraints include one or combination of velocity constraints, lane keeping constraints, and obstacle avoidance constraints, and wherein the control input constraints include one or combination of steering angle constraints, and acceleration constraints).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Benosman with a reasonable expectation of success in order to provide optimal control. The combination would yield predictable results. 

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hoel in view of Mukadam and Das, and further in view of US 11577722 (“Packer”).
As per claim(s) 8, Hoel does not explicitly disclose wherein the RL agent is trained to perform decision-making based on a state of the vehicle and/or of vehicles surrounding the vehicle which is not included in the vehicle’s actual motion state sensed by the feedback controller. 
However, Mukadam teaches wherein the RL agent is trained to perform decision-making based on a state of the vehicle and/or of vehicles surrounding the vehicle; and the vehicle’s actual motion state sensed by the feedback controller (see at least page 3: inputs to the network is the state of the ego car, which consists of internal and external information. Scalar inputs, velocity v, lane l, and distance to goal d2g, are chosen to represent internal information all of which are scaled between 0 and 1, page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, Figure 2(a): Low-level module including at least a low-level controller, Action, deep Q network and the low-level module that interface together using Q-masking).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results.
However, Das teaches wherein the RL agent is trained to perform decision-making based on a state of the vehicle and/or of vehicles surrounding the vehicle that is rich information (see at least abstract: adaptive cruise control (ACC) system, page 1 section 1: novel dual reinforcement learning (RL) agent approach…separate RL agent is designed to find and adapt the optimal TTC threshold (Gettman & Head, 2003) based on rich traffic information including both macroscopic and microscopic traffic data obtained from the surrounding environment).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of rich input information into a RL agent as taught by Das with a reasonable expectation of success in order to maximize traffic efficiency and driving safety. The combination would yield predictable results. 
However, Packer teaches wherein the agent is trained to perform decision-making based on a state of the vehicle and/or of vehicles surrounding the vehicle which is not included in the vehicle’s actual motion state sensed by the feedback controller (see at least column 35 lines 41-55: Different amounts of sensor data may be input into a model suitable for performing prediction techniques specified by the model. In some examples, different models may receive different amounts and types of sensor data). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings as taught by Packer with a reasonable expectation of success in order to input an amount of sensor data that is suitable for the model and to help improve how a vehicle navigates in an environment. The combination would yield predictable results. 

Claim(s) 5, 10, 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hoel in view of Mukadam, and further in view of Benosman.
As per claim(s) 5, 18, Hoel does not explicitly disclose a feedback controller separate from the RL agent; providing a second feedback controller configured to control at least one lateral motion actuator; wherein the RL agent is trained to perform joint decision-making regarding a setpoint motion state of the longitudinal motion actuator and regarding a setpoint motion state of the lateral motion actuator.
However, Mukadam teaches a feedback controller separate from the RL agent, providing a feedback controller configured to control at least one lateral motion; wherein the RL agent is trained to perform joint decision-making regarding a setpoint motion state of the longitudinal motion and regarding a setpoint motion state of the lateral motion (see at least page 1: generate a steering command with adaptive control, page 3: high-level tactical decision making strategy such that the ego car makes efficient lane change maneuvers while relying on the low-level controller for collision free lane changing between adjacent lanes, page 4: Given a state the lower-level module can restrict (or mask off) any set of actions that the agent does not need to explore or learn from their outcomes, page 5: a rule based time to collision (TTC) method [24] (we set the threshold as 10s) that checks for collisions given the state against all actions and masks off those actions that lead to collision). 
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Mukadam with a reasonable expectation of success in order to leverage strengths of deep reinforcement learning for high-level tactical decision making and rule-based methods for low-level control and for improved efficiency and driving safety (see at least Mukadam page 6). The combination would yield predictable results. Further, the second feedback controller appears to be duplication of parts and does not appear to produce a new and unexpected results because the low-level controller of Mukadam includes for collision free lane-changing. In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960).
However, Benosman teaches providing a second feedback controller configured to control at least one lateral motion actuator; wherein the RL agent is trained to perform joint decision-making regarding a motion state of the longitudinal motion actuator and regarding a motion state of the lateral motion actuator (see at least [0111]: controller 1402 outputs control commands to the controllers 1416 and 1418 to control the kinematic state of the vehicle. In some embodiments, the controllers 1414…a lane keeping controller 1420 that further process the control commands of the controller 1402… the controllers 1414 utilize the output of the controller 1402 i.e. control commands to control at least one actuator of the vehicle, such as the steering wheel and/or the brakes of the vehicle, in order to control the motion of the vehicle, claim 1: reinforcement learning (RL) having a value function reducing a difference between a shape of the received state trajectory and a shape of state trajectory estimated using the model with the updated closure model; and determine a control command based on the model with the updated closure model; and an output interface configured to transmit the control command to an actuator of the system to control the operation of the system, claim 13: vehicle controlled to perform one or combination of a lane keeping, a cruise control, and an obstacle avoidance operation, wherein the state of the vehicle includes one or combination of a position, an orientation, and a longitudinal velocity, and a lateral velocity of the vehicle, wherein the control inputs include one or combination of a lateral acceleration, a longitudinal acceleration, a steering angle, an engine torque, and a brake torque, state constraints include one or combination of velocity constraints, lane keeping constraints, and obstacle avoidance constraints, and wherein the control input constraints include one or combination of steering angle constraints, and acceleration constraints).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Benosman with a reasonable expectation of success in order to provide optimal control. The combination would yield predictable results. 

As per claim(s) 10, Hoel does not explicitly disclose wherein the RL agent has been trained to perform decision-making in such a manner as to minimize a total cost of operation (TCOP).
However, Benosman teaches wherein the RL agent has been trained to perform decision-making in such a manner as to minimize a total cost of operation (TCOP) (see at least [0010]: reinforcement learning (RL) is an area of machine learning concerned with how to take actions in an environment so as to maximize some notion of cumulative reward (or equivalently, minimize a cumulative loss/cost).
It would have been obvious to one of ordinary skill in the art before the effective filing date to provide the invention as disclosed by Hoel by incorporating the teachings of Benosman with a reasonable expectation of success in order to provide optimal control. The combination would yield predictable results.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANGELINA M SHUDY whose telephone number is (571)272-6757. The examiner can normally be reached M - F 10am - 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fadey Jabr can be reached at 571-272-1516. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Angelina Shudy
Primary Examiner
Art Unit 3668



/Angelina M Shudy/Primary Examiner, Art Unit 3668
Read full office action
Prosecution Timeline

Sep 09, 2024
Application Filed
Feb 21, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/658,535
Patent 12600359
TARGET OBJECT SELECTION FOR A LONGITUDINAL GUIDANCE SYSTEM AND ELECTRONIC VEHICLE GUIDANCE SYSTEM OF A MOTOR VEHICLE
2y 5m to grant Granted Apr 14, 2026
18/164,803
Patent 12591243
PATH DETERMINATION FOR AUTOMATIC MOWERS
2y 5m to grant Granted Mar 31, 2026
18/340,249
Patent 12583456
PROBABILISTIC DRIVING BEHAVIOR MODELING SYSTEM FOR A VEHICLE
2y 5m to grant Granted Mar 24, 2026
18/639,029
Patent 12583446
Systems and Methods to Determine a Lane Change Strategy at a Merge Region
2y 5m to grant Granted Mar 24, 2026
18/676,751
Patent 12570280
VEHICLE COMPRISING VEHICLE CONTROL APPARATUS
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
77%
Grant Probability
86%
With Interview (+9.4%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 455 resolved cases by this examiner. Grant probability derived from career allow rate.