Last updated: April 19, 2026
Application No. 18/258,375
SYSTEM AND METHOD FOR REINFORCEMENT LEARNING OF STEERING GEOMETRY

Non-Final OA §101§103
Filed
Jun 20, 2023
Examiner
KIM, SEHWAN
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Volvo Truck Corporation
OA Round
1 (Non-Final)
Interview Optional

— +65.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

KIM, SEHWAN View full profile →
Grants 60% of resolved cases
Career Allow Rate
86 granted / 144 resolved
+4.7% vs TC avg
Strong +66% interview lift
Without
With
+65.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
35 currently pending
Career history
179
Total Applications
across all art units
Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
6.3%
-33.7% vs TC avg
§112
23.3%
-16.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Note
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 101 and § 103, for moving forward allowance.
Providing supporting paragraph(s) for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant's claim for the PCT application filed on 12/21/2020.

Claim Objections
Claim(s) 7, 15 is/are objected to because of the following informalities.
Claim(s) 7 is/are objected to because of the following informalities: it appears that “to regarding” (line 2) needs to read “to” or something else. Appropriate correction is required. In addition, claim(s) 15 is/are objected to for the same reason.
Claim(s) 7, 15 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are objected to at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 5, 13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 5
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“wherein … by:
performing a sensitivity analysis which identifies correlations between known values of vehicle data associated with the vehicle information, known values of steering geometry component, known driving cycles, and known vehicle applications;
…; and
…”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element(s) (“the machine learning model is generated”, “forming, via a computing device, a neural network using the correlations”, “converting, via the computing device, the neural network to computer executable code, resulting in the machine learning model”). The additional element is recited at such a high level without any details as to how a model is generated such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“via a computing device”) – using a device and/or a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
The additional elements regarding training are recited at such a high level without any details as to how a model is generated such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not amount to significantly more than the abstract idea. The claim is directed to an abstract idea.
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. MPEP 2106.05(f).

Regarding claim 13
The claim is rejected for the reasons set forth in the rejection of Claim 5 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 6, 9-10, 14, 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (Hierarchical Reinforcement Learning for Autonomous Decision Making and Motion Planning of Intelligent Vehicles) in view of Kendall et al. (Learning to Drive in a Day)

Regarding claim 1
(Note: Hereinafter, if a limitation has bold brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

Lu teaches
A method comprising:
receiving, at a processor [aboard] a vehicle, vehicle information associated with ongoing movement of the vehicle;
(Lu [fig(s) 1] “Test process in simulated or real environment” [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) III.B] “The batch-mode DHP planner can be trained by data collected from real cars or high-fidelity software Carsim.” [sec(s) V] “In this article, a hierarchical reinforcement learning approach is proposed for autonomous decision making and motion planning in complex dynamic traffic scenarios. The motion data-samples of the ego vehicle and the rule-based surrounding vehicles are collected by a high-fidelity 14-DOF dynamics for the learning process in the decision-making problems.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.” [sec(s) IV.C] “In summary, our algorithm is feasible in dealing with decision-making and motion planning problems in real environments. As to the applications in real environments, we need to obtain the state information, i.e. st as in equation (16) or (23).”;)

executing, via the processor, a reinforcement learning model, wherein:
(Lu [fig(s) 1] [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.”;)

inputs to the reinforcement learning model comprise:
the vehicle information; and 
(Lu [fig(s) 1] [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.”; e.g., “14-degree-of-freedom (DOF) dynamics” read(s) on “vehicle information”.)

at least one feedback item, the at least one feedback item indicating if a previous output of the reinforcement learning model was correct;
(Lu [fig(s) 1] [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. … The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) III.A] “Both the surrounding vehicles and the ego vehicle are simulated with 14-DOF vehicle dynamics [40]. After the ego vehicle takes a randomized action at, one sampling process terminates when the ego vehicle crosses the 1-th lane or waits at the interaction for 1s, thereby obtaining the next state st+1. The simulation runs for such preset time mainly because we have computed the approximate time to complete a sampling process. The reward function is designed as 
    PNG
    media_image1.png
    208
    855
    media_image1.png
    Greyscale
 (21) where ξ is the adjustability coefficients, i.e. the penalty for the magnitude of the velocity. Variable Δt is the completion time of each task, and YEgo is the ordinate value. The process of obtaining a sample [st, at,st+1,rt]: in initial state st, randomized action at was taken, a reward rt was received, and the resulting state was st+1. Then we train the KLSPI algorithm based on the samples to obtain a decision-making policy. One thing to emphasize is that the autonomous vehicle needs to make continuous decisions in the area of A1 and A2 which are shown in Fig. 3.”; e.g., “reward” read(s) on “feedback item indicating if a previous output of the reinforcement learning model was correct”.)

outputs of the reinforcement learning model comprise:
a current driving cycle of the vehicle; and 
(Lu [fig(s) 1] “Decisions”, “at” [algorithm 1] “Ensure: The decision considering the longitudinal speed” [sec(s) Abs] “The lower layer addresses the motion-planning problem in the lateral direction using a dual heuristic programming (DHP) algorithm learned in a batch-mode manner, while the velocity profile in the longitudinal direction is inherited from the higher layer.”  [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm.” [sec(s) III.A.2)] “The optional actions of the autonomous vehicle in this scenario can be expressed as: at = {Slow, Keep, Acc} (17) The actions are slowing down, keeping an original speed and acceleration, respectively.” [sec(s) III.A.3)] “The action at is defined as at ∈ {LCT (0) Acc, LCT (0) Slow, LCT (1)} (23) where LCT(i),{i = 0, 1} means a lane-changing maneuver to the i-th lane, LCT (0) represents driving on the original ramp, LCT (1) means a lane-changing maneuver to the 1-th lane. Acc, Slow mean to accelerate and slow down, respectively. As the equation (23) shows, the autonomous vehicle has a total of three optional actions in this scenario. They are: acceleration on the ramp, slowing down on the ramp, and changing to the 1-th lane.”;)

a current application of the vehicle;
(Lu [fig(s) 1] “Decisions”, “at” [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm.” [sec(s) III.A.2)] “The optional actions of the autonomous vehicle in this scenario can be expressed as: at = {Slow, Keep, Acc} (17) The actions are slowing down, keeping an original speed and acceleration, respectively.” [sec(s) III.A.3)] “The action at is defined as at ∈ {LCT (0) Acc, LCT (0) Slow, LCT (1)} (23) where LCT(i),{i = 0, 1} means a lane-changing maneuver to the i-th lane, LCT (0) represents driving on the original ramp, LCT (1) means a lane-changing maneuver to the 1-th lane. Acc, Slow mean to accelerate and slow down, respectively. As the equation (23) shows, the autonomous vehicle has a total of three optional actions in this scenario. They are: acceleration on the ramp, slowing down on the ramp, and changing to the 1-th lane.”;)

executing, via the processor, a machine learning model, wherein:
(Lu [fig(s) 1] [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.”;)

inputs to the machine learning model comprise:
the outputs of the reinforcement learning model; and 
(Lu [fig(s) 1] [algorithm 1] “Obtain the states of the ego vehicle and the ruled-based surrounding vehicles” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.”;)

the vehicle information; and 
(Lu [fig(s) 1] “Test process in simulated or real environment” [algorithm 1] [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40].”;)

output of the machine learning model comprises a wheel alignment signal.
(Lu [fig(s) 1] “δ is the front steering angle” [algorithm 1] “7: Generate the front steering angle for lateral trajectory planning” [sec(s) III] “From the task decomposition perspective, we propose a hierarchical reinforcement learning approach for solving decision-making and motion-planning problems, as depicted in Fig.1. At the higher layer, we utilize the USP-KLSPI to make decisions. The process for obtaining decision-making policy includes four parts: MDP modeling of the decision-making tasks, uneven sampling, sample pooling strategy, and the KLSPI algorithm. At the lower layer, the real-time obstacle avoidance is solved by the planning policy being trained via a DHP in a batch-mode way. Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40].”;)

However, Lu does not appear to explicitly teach:
receiving, at a processor [aboard] a vehicle, vehicle information associated with ongoing movement of the vehicle;

(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Kendall teaches
receiving, at a processor aboard a vehicle, vehicle information associated with ongoing movement of the vehicle;
(Kendall [sec(s) IV] “We conduct our experiments using a modified Renault Twizy vehicle, which is a two seater electric vehicle, shown in Figure 1. The vehicle weighs 500kg, has a top speed of 80 km/h and has a range of 100km on a single battery charge. We use a single monocular forward-facing video camera mounted in the centre of the roof at the front of the vehicle. We use retrofitted electric motors to actuate the brake and steering, and electronically emulate the throttle position to regulate torque to the wheels. All computation is done on-board using a single NVIDIA Drive PX2 computer. The vehicle’s drive-by-wire automation automatically disengages if the safety driver intervenes, either by using vehicle controls (brake, throttle, or steering), toggling the automation mode, or pressing the emergency stop. An episode would terminate when either speed exceeded 10km/h, or drive-by-wire automation disengaged, indicating the safety driver has intervened. The safety driver would then reset the car to the centre of the road and continue with the next episode” [sec(s) I] “4) learn to drive a real-world autonomous vehicle in a few episodes with a continuous deep reinforcement learning algorithm, using only on-board computation.” [sec(s) V] “This work presents the first application of deep reinforcement learning to a full sized autonomous vehicle. The experiments demonstrate we are able to learn to lane follow with under thirty minutes of training – all done on on-board computers.” [sec(s) III.A] “In this paper, we show that for simple driving tasks it is sufficient to use a monocular camera image, together with the observed vehicle speed and steering angle”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu with the processor aboard a vehicle of Kendall.
One of ordinary skill in the art would have been motived to combine in order to improve learned autonomous driving behaviour by providing a corrective mechanism.
(Kendall [sec(s) I] “We argue that the generality of reinforcement learning makes it a useful framework to apply to autonomous driving. Most importantly, it provides a corrective mechanism to improve learned autonomous driving behaviour. To this end, in this paper we: 1) pose autonomous driving as an MDP, explain how to design the various elements of this problem to make it simpler to solve, whilst keeping it general and extensible, 2) show that a canonical RL algorithm (deep deterministic policy gradients [8]) can rapidly learn a simple autonomous driving task in a simulation environment, 3) discuss the system set-up required to make learning to drive efficient and safe on a real-world vehicle, 4) learn to drive a real-world autonomous vehicle in a few episodes with a continuous deep reinforcement learning algorithm, using only on-board computation.”)

Regarding claim 2
The combination of Lu, Kendall teaches claim 1.

Lu further teaches
transmitting, from the processor to at least one actuator of the vehicle, the wheel alignment signal; and
(Lu [fig(s) 1] “Test process in simulated or real environment”, “Low-level control”, “vexpect is longitudinal speed and δ is the front steering angle” [algorithm 1] [sec(s) III] “Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) III.B] “The learning-based motion planning in this article utilizes a kernel-based DHP [41] to train for a planner. The next state is obtained by applying the control action to the 14-DOF vehicle dynamics in the simulated environment.” [sec(s) V] “In this article, a hierarchical reinforcement learning approach is proposed for autonomous decision making and motion planning in complex dynamic traffic scenarios. The motion data-samples of the ego vehicle and the rule-based surrounding vehicles are collected by a high-fidelity 14-DOF dynamics for the learning process in the decision-making problems.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.” [sec(s) IV.C] “In summary, our algorithm is feasible in dealing with decision-making and motion planning problems in real environments. As to the applications in real environments, we need to obtain the state information, i.e. st as in equation (16) or (23).”;)

modifying, via the at least one actuator based on the wheel alignment signal, at least one component of the vehicle, resulting in a modified steering geometry of the vehicle.
(Lu [fig(s) 1] “Test process in simulated or real environment”, “Low-level control”, “vexpect is longitudinal speed and δ is the front steering angle” [algorithm 1] [sec(s) III] “Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) III.B] “The learning-based motion planning in this article utilizes a kernel-based DHP [41] to train for a planner. The next state is obtained by applying the control action to the 14-DOF vehicle dynamics in the simulated environment.” [sec(s) V] “In this article, a hierarchical reinforcement learning approach is proposed for autonomous decision making and motion planning in complex dynamic traffic scenarios. The motion data-samples of the ego vehicle and the rule-based surrounding vehicles are collected by a high-fidelity 14-DOF dynamics for the learning process in the decision-making problems.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.” [sec(s) IV.C] “In summary, our algorithm is feasible in dealing with decision-making and motion planning problems in real environments. As to the applications in real environments, we need to obtain the state information, i.e. st as in equation (16) or (23).”;)

Regarding claim 6
The combination of Lu, Kendall teaches claim 1.

Kendall further teaches
wherein the at least one feedback item comprises an indication of accuracy from a driver of the vehicle regarding previous outputs of the reinforcement learning model.
(Kendall [fig(s) 1] “We design a deep reinforcement learning algorithm for autonomous driving. This figure illustrates the actor-critic algorithm which we use to learn a policy and value function for driving. Our agent maximises the reward of distance travelled before intervention by a safety driver.” [sec(s) III.C] “Deployment of a reinforcement learning algorithm on a full-sized robotic vehicle running in a real world environment requires adjustment of common training procedures, to account for both driver intervention and external variables affecting the training. … Each episode is executed until the system detects that automation is lost (i.e. the driver intervened). In a real world environment, the system can not reset automatically between episodes, unlike agents in simulation or in a constrained environment. We require a human driver to reset the vehicle to a valid starting state. Upon episode termination, while the safety driver performs this reset, the model is being optimised, minimising the time spent between episodes” [sec(s) V] “In this work, we present a general reward function which asks the agent to maximise the distance travelled without intervention from a safety driver. While this reward function is general, it has a number of limitations. It does not consider conditioning on a given navigation goal. Furthermore, it is incredibly sparse. As our agent improves, interventions will become significantly less frequent, resulting in weaker training signal.” [sec(s) IV] “An episode would terminate when either speed exceeded 10km/h, or drive-by-wire automation disengaged, indicating the safety driver has intervened. The safety driver would then reset the car to the centre of the road and continue with the next episode.”;)

The combination of Lu, Kendall is combinable with Kendall for the same rationale as set forth above with respect to claim 1.

Regarding claim 9
The claim is a system claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 10
The claim is a system claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 14
The claim is a system claim corresponding to the method claim 6, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 17
The claim is a computer program product claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 18
The claim is a computer-readable storage medium claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Claim(s) 3, 11, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (Hierarchical Reinforcement Learning for Autonomous Decision Making and Motion Planning of Intelligent Vehicles) in view of Kendall et al. (Learning to Drive in a Day) in view of Bhardwaj et al. (The Effects of Driver Coupling and Automation Impedance on Emergency Steering Interventions)

Regarding claim 3
The combination of Lu, Kendall teaches claim 1.

Lu further teaches
[displaying a notification to manually] modify a steering geometry of the vehicle based on the wheel alignment signal.
(Lu [fig(s) 1] “Test process in simulated or real environment”, “Low-level control”, “vexpect is longitudinal speed and δ is the front steering angle” [algorithm 1] [sec(s) III] “Finally, the expected longitudinal speed vexpect and the steering angle δ are transferred to the low-level controller. In the both layers, the data samples used for training are collected using a high-fidelity 14-degree-of-freedom (DOF) dynamics, which is referred to the previous work in [8], [40]. The dynamics of the built vehicle consists of 3 modules, which are the steering system, the body suspension model, and the motion. Also, the effectiveness of the model to the real vehicle dynamics is verified. The main steps of HRLDP is shown in Algorithm 1. The algorithm includes the higher decision-making layer and the lower motion-planning layer.” [sec(s) III.B] “The learning-based motion planning in this article utilizes a kernel-based DHP [41] to train for a planner. The next state is obtained by applying the control action to the 14-DOF vehicle dynamics in the simulated environment.” [sec(s) V] “In this article, a hierarchical reinforcement learning approach is proposed for autonomous decision making and motion planning in complex dynamic traffic scenarios. The motion data-samples of the ego vehicle and the rule-based surrounding vehicles are collected by a high-fidelity 14-DOF dynamics for the learning process in the decision-making problems.” [sec(s) IV.A] “The simulation is performed in the Matlab environment by a Desktop with Intel i7-8700K CPU @ 3.7GHz and 16GB RAM, and Windows 10 operating system.” [sec(s) IV.C] “In summary, our algorithm is feasible in dealing with decision-making and motion planning problems in real environments. As to the applications in real environments, we need to obtain the state information, i.e. st as in equation (16) or (23).”;)

However, the combination of Lu, Kendall does not appear to explicitly teach:
[displaying a notification to manually] modify a steering geometry of the vehicle based on the wheel alignment signal.

Bhardwaj teaches
displaying a notification to manually modify a steering geometry of the vehicle based on the wheel alignment signal.
(Bhardwaj [fig(s) 1] [sec(s) II.E] “The driving task was to keep the vehicle centered in the right lane of the two-way road and avoid any obstacles that appeared in the lane. To help the driver with lane centering, a lane departure warning appeared on the virtual dashboard (Fig. 1d) when the deviation of the vehicle from the center of the right lane exceeded 0.6 m (the lane was 4 m wide). Obstacles in the form of pedestrians, deer, or other vehicles unexpectedly entered the road from the right side of the driving lane (Fig. 1b) and stopped at the center of the lane. Time available to avoid the obstacles was about one second. As soon as the obstacle stopped, the automation system performed an emergency steering intervention towards the left to help the driver avoid the obstacle. During the steering intervention, the lane departure warning disappeared and an ‘AUTOMATION IS ON’ notification appeared on the virtual dashboard to indicate that the automation system was active. After avoiding the obstacle, the automation system returned the vehicle back to the center of the right lane at which point a take-over-request (TOR) notification ‘TAKE OVER CONTROL’ appeared on the virtual dashboard. Four seconds after the first appearance of the TOR, monotone auditory alerts generating one “beep” every two seconds were sent from a speaker to remind the driver to take over. The notifications and the auditory alert turned off as soon as the driver pressed the red button, took back control, and resumed manual driving.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu, Kendall with the notification of Bhardwaj.
One of ordinary skill in the art would have been motived to combine in order to improve the driver’s ability to successfully intervene during an emergency by combing the advantages of low and high impedance automation.
(Bhardwaj [sec(s) IV] “In summary, the results of this study highlight a trade-off in automation design for emergency situations: high impedance automation can significantly reduce unwarranted driver input on the steering wheel during emergency situations but may cause driver discomfort and may be too strong to override during automation faults. This result is consistent with the hypotheses and findings presented in the past [5], [10], [17], [20]–[22], [24]. Contrary to expectations, decoupling the driver during emergency interventions did not significantly increase the time required for the driver to resume control or the number of collisions during automation dropouts. To combine the advantages of low and high impedance automation, an adaptive impedance system could be designed that would assume a high level of authority during emergency situations in which the automation has high confidence, and a low level of authority during situations in which the automation has low confidence to give override power to the human.” [sec(s) Abs] “Results showed that a high impedance automation system results in significantly fewer collisions during intended steering interventions but significantly higher collisions during automation faults when compared to a low impedance automation system. Moreover, decoupling the driver did not seem to significantly influence the time required to hand back control to the driver. When coupled, drivers were able to cover for a faulty automation system and avoid obstacles to a certain degree, though differences by condition were significant for only one type of automation fault”)

Regarding claim 11
The claim is a system claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 19
The claim is a computer-readable storage medium claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Claim(s) 4, 12, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (Hierarchical Reinforcement Learning for Autonomous Decision Making and Motion Planning of Intelligent Vehicles) in view of Kendall et al. (Learning to Drive in a Day) in view of Kasper et al. (US 20190084533 A1) in view of Hoare et al. (WO 2017012978 A1)

Regarding claim 4
The combination of Lu, Kendall teaches claim 1.

Lu further teaches
wherein the vehicle information comprises:
a velocity of the vehicle;
[wheel speeds of the vehicle;
a steering angle of the vehicle;
a throttle of the vehicle;
a brake pedal status of the vehicle;
axle load data of the vehicle;
GPS (Global Positioning System) data of the vehicle; and
suspension articulation data of the vehicle].
(Lu [algorithm 2] “4: Randomize each state st and collect Nnum samples [st, at,st+1,rt] by using an exploration policy;” [sec(s) III.A.2)] “st = [VEgo Vri dri Vfi dfi] i = 1 or i = 2 (16) … Denote VEgo as the current longitudinal speed of the ego vehicle, where VEgo is set within the range [0, 13m/s].”;)

However, the combination of Lu, Kendall does not appear to explicitly teach:
[wheel speeds of the vehicle;
a steering angle of the vehicle;
a throttle of the vehicle;
a brake pedal status of the vehicle;
axle load data of the vehicle;
GPS (Global Positioning System) data of the vehicle; and
suspension articulation data of the vehicle].

Kasper teaches
wheel speeds of the vehicle;
a steering angle of the vehicle;
a throttle of the vehicle;
a brake pedal status of the vehicle;
axle load data of the vehicle;
[GPS (Global Positioning System) data of the vehicle; and
suspension articulation data of the vehicle].
(Kasper [par(s) 47] “FIG. 1 illustrates an air brake system 10 of a towing vehicle, or tractor, by way of an example application. The system 10 includes an electronic towing vehicle controller 22 with inputs for electrically connecting to, either directly or through a vehicle communication bus such as for example a serial communication bus, at least four modulators 40, at least four wheel speed sensors 44, at least two traction relay valves 41, a trailer pressure control device 34, a steering angle sensor 46, a lateral acceleration sensor 27, a yaw rate sensor 26, and a load sensor 24. The pneumatic portion of the tractor air brake system 10 includes at least four brake actuators 42, at least two reservoirs 48, and an operator actuated brake pedal 50. Each of the at least four wheel speed sensors 44 communicates the individual wheel speeds to the towing vehicle controller 22 for use in antilock braking system (ABS), automatic slip regulation (ASR), and electronic stability control (ESC) algorithms.” [par(s) 55] “For example, the devices 214 may be one or more sensors, such as but not limited to, one or more wheel speed sensors 44, a lateral acceleration sensor 27, a steering angle sensor 46, a brake pressure sensor 34, a vehicle load sensor 24, a yaw rate sensor 26, a set of one or more wheel slip sensor(s) 222, a vehicle deceleration sensor 223, and a brake pedal position sensor 224.” [par(s) 56] “the processor 230 may generate and send the control signal to an engine electronic control unit or an actuating device to reduce the engine throttle 234 and slowing the vehicle down.” [par(s) 73] “In particular, the time from brake apply to when the vehicle decelerates is monitored during every stop of the vehicle relative to wheel slippage of the one or more wheels of the tractor and/or of the trailer. This information together with information regarding the vehicle and axle loads during every stop and is entered or otherwise used as a data point defining the vehicle response delay. Similarly, knowing the axle loads and the ABS Activation of a wheel end for a given pressure such as determined by or from a brake pressure sensor 35 for example is used to create another data point.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu, Kendall with the vehicle information of Kasper.
One of ordinary skill in the art would have been motived to combine in order to provide an improved brake control of towed vehicles of a combination vehicle.
(Kasper [par(s) 19-20] “The embodiments herein provide for new and improved systems and methods for providing brake control of one or more towed vehicles of a combination vehicle. The embodiments herein provide a braking con troller and method in a towing vehicle towing one or more towed vehicles as a combination vehicle providing brake control of the one or more towed vehicles based on a level of braking force applied to the towing vehicle. A non-enhanced braking mode applies a first level of braking force to the towed vehicles in a predetermined reduced proportion relative to the level of braking force applied to the towing vehicle, and an enhanced braking mode applies a second level of braking force to the towed vehicles greater than the first level of braking force.”)

However, the combination of Lu, Kendall, Kasper does not appear to explicitly teach:
[GPS (Global Positioning System) data of the vehicle; and
suspension articulation data of the vehicle].

Hoare teaches
GPS (Global Positioning System) data of the vehicle; and
suspension articulation data of the vehicle.
(Hoare [pp. 12-13] “Figure 2 shows the VCS 14 in more detail. The VCS 14 includes a data processor 40 for determining whether the vehicle 10 is undergoing either a parking event or a terrain identification event. The VCS 14 also includes a data memory or memory device 42 having instructions stored therein, the data processor 40 being arranged to execute said instructions in order to make the above determination. The data memory 42 may be an electronic, non-transitory, computer-readable storage medium. The data memory 42 also includes predetermined vehicle output data with certain data values for the subsystems 32 being associated with a vehicle parking event and/or a terrain identification event. This predetermined data is used by the processor 40 in order that the above determination may be made. The data processor 40 has an input 44 that is arranged to receive data from the other subsystems 32. It is based on this received data that the processor 40 makes the determination mentioned above. The input 44 from the subsystems 32 includes manual input from the driver via the HMI 34. The other subsystems and sensors 32 can include, for example, a vehicle speed subsystem or sensor, a sensor determining when a reverse gear is selected, a vehicle steering input subsystem or sensor, a brake pedal position sensor, a suspension articulation sensor, an acceleration sensor, a wheel slip sensor, a pitch rate sensor, a yaw rate sensor, an automatic park subsystem, a vehicle location subsystem or sensor such as a Global Positioning System (GPS), one or more further acoustic sensors, an optical sensor and/or a vehicle-mounted radar sensor. The data processor 40 has an output 46 that is arranged to send a control signal to the acoustic sensor 12.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu, Kendall, Kasper with the GPS and suspension of Hoare.
One of ordinary skill in the art would have been motived to combine in order to advantageously send a control signal to adjust the setup of the acoustic sensor to provide useful sensor output data to either the proximity detection or terrain identification subsystem.
(Hoare [p. 3] “The system is advantageous in that it can use measurements from various systems on the vehicle to ensure that the at least one acoustic sensor is being used in the most suitable manner. In particular, the system can determine automatically which of the proximity detection or terrain identification mode is more suitable for a particular set of driving conditions, in particular based on the received vehicle output data, and advantageously sends a control signal to adjust the setup of the acoustic sensor so that it may provide useful sensor output data to either the proximity detection or terrain identification subsystem, as appropriate.”)

Regarding claim 12
The claim is a system claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 20
The claim is a computer-readable storage medium claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Claim(s) 5, 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (Hierarchical Reinforcement Learning for Autonomous Decision Making and Motion Planning of Intelligent Vehicles) in view of Kendall et al. (Learning to Drive in a Day) in view of Belkhode et al. (Analysis and Interpretation of Steering Geometry of Automobile Using Artificial Neural Network Simulation)

Regarding claim 5
The combination of Lu, Kendall teaches claim 1.

However, the combination of Lu, Kendall does not appear to explicitly teach:
wherein the machine learning model is generated by:
performing a sensitivity analysis which identifies correlations between known values of vehicle data associated with the vehicle information, known values of steering geometry component, known driving cycles, and known vehicle applications;
forming, via a computing device, a neural network using the correlations; and
converting, via the computing device, the neural network to computer executable code, resulting in the machine learning model.

Belkhode teaches
wherein the machine learning model is generated by:
performing a sensitivity analysis which identifies correlations between known values of vehicle data associated with the vehicle information, known values of steering geometry component, known driving cycles, and known vehicle applications;
(Belkhode [sec(s) Abs] “Vehicle dynamics is the one of the most important factors in the analysis and predicting the steering behavior of automobile. The paper details the evaluation of the Artificial Neural Network (ANN) structures to estimate the steering geometry parameters of four wheel vehicle. One of the aspects of vehicle performance is performance of steering geometry. … Steering geometry is evaluated through the independent and dependent variables of front suspension. Dependent variables such as steering geometry parameters kingpin inclination angle, caster angle, camber angle, toe angle, scrub radius, toe in and toe out are determined with the help of independent variables. These dependent variables are validated through ANN simulation.” [sec(s) 2] “Once these angles are measured and position of linkage of front suspension is decided, position of kingpin axis can be located. The included angles at the joints of front suspension mechanism are first decided by potentiometers. These measured angles are supplied to interfacing program which calculates the steering performance parameters such as Kingpin angle, Camber angle, Caster angle, Toe angle, Toe in, Toe out, Scrub radius. The experimental setup is formulated on which trial are recoded with varying speed and breakers height. The steering geometry parameters such as link lengths, clearance at the joints, joints angles, breakers height, velocity and wheel diameter are recorded with the help of measuring instruments. Joints angles are measured by the potentiometer and position are joint A and B is located. Position of joint A and B further decided the position of kingpin inclination. Kingpin inclination is used for finding the steering geometry such Kingpin angle, Camber angle, Caster angle, Toe angle, Toe in, Toe out, Scrub radius.”; e.g., “trial are recoded with varying speed and breakers height” read(s) on “driving cycles”.)

forming, via a computing device, a neural network using the correlations; and
(Belkhode [sec(s) Abs] “Vehicle dynamics is the one of the most important factors in the analysis and predicting the steering behavior of automobile. The paper details the evaluation of the Artificial Neural Network (ANN) structures to estimate the steering geometry parameters of four wheel vehicle. One of the aspects of vehicle performance is performance of steering geometry. … Steering geometry is evaluated through the independent and dependent variables of front suspension. Dependent variables such as steering geometry parameters kingpin inclination angle, caster angle, camber angle, toe angle, scrub radius, toe in and toe out are determined with the help of independent variables. These dependent variables are validated through ANN simulation. … The objectives of this study were to evaluate the accuracy of ANN for estimation of steering parameters. Artificial Neural Network technique is recently used in the entire field to evaluate the experimental or field data. Network is trained with known inputs and outputs.”; Note that Lu teaches “computing device.”)

converting, via the computing device, the neural network to computer executable code, resulting in the machine learning model.
(Belkhode [sec(s) Abs] “The objectives of this study were to evaluate the accuracy of ANN for estimation of steering parameters. Artificial Neural Network technique is recently used in the entire field to evaluate the experimental or field data. Network is trained with known inputs and outputs. Once network is trained output is predicated based on the new inputs. Paper details the validation of the experimental data with the help of Artificial Neural Network” [sec(s) 2] “Once these angles are measured and position of linkage of front suspension is decided, position of kingpin axis can be located. The included angles at the joints of front suspension mechanism are first decided by potentiometers. These measured angles are supplied to interfacing program which calculates the steering performance parameters such as Kingpin angle, Camber angle, Caster angle, Toe angle, Toe in, Toe out, Scrub radius. The experimental setup is formulated on which trial are recoded with varying speed and breakers height. The steering geometry parameters such as link lengths, clearance at the joints, joints angles, breakers height, velocity and wheel diameter are recorded with the help of measuring instruments. Joints angles are measured by the potentiometer and position are joint A and B is located. Position of joint A and B further decided the position of kingpin inclination. Kingpin inclination is used for finding the steering geometry such Kingpin angle, Camber angle, Caster angle, Toe angle, Toe in, Toe out, Scrub radius.” [sec(s) 4] “The detailed ANN program used for evaluation the steering geometry is provided in the Appendix. … ANN program shown in Appendix is run on the MATLAB software. The ANN Outputs consists of all the steering parameters are shown in the Table 1.”; Note that Lu teaches “computing device.”)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu, Kendall with the machine learning model of Belkhode.
One of ordinary skill in the art would have been motived to combine in order to accurately determine the optimal values by performing the trained ANN model.
(Belkhode [sec(s) 5] “The embodiments herein provide for new and improved systems and methods for providing brake control of one or more towed vehicles of a combination vehicle. The embodiments herein provide a braking con troller and method in a towing vehicle towing one or more towed vehicles as a combination vehicle providing brake control of the one or more towed vehicles based on a level of braking force applied to the towing vehicle. A non-enhanced braking mode applies a first level of braking force to the towed vehicles in a predetermined reduced proportion relative to the level of braking force applied to the towing vehicle, and an enhanced braking mode applies a second level of braking force to the towed vehicles greater than the first level of braking force.”)

Regarding claim 13
The claim is a system claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Claim(s) 7, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (Hierarchical Reinforcement Learning for Autonomous Decision Making and Motion Planning of Intelligent Vehicles) in view of Kendall et al. (Learning to Drive in a Day) in view of Zhang et al. (Increasing GPS Localization Accuracy With Reinforcement Learning)

Regarding claim 7
The combination of Lu, Kendall teaches claim 1.

However, the combination of Lu, Kendall does not appear to explicitly teach:
wherein the at least one feedback item comprises a comparison of actual GPS data to regarding previous outputs of the reinforcement learning model.

Zhang teaches
wherein the at least one feedback item comprises a comparison of actual GPS data to regarding previous outputs of the reinforcement learning model.
(Zhang [fig(s) 1] [fig(s) 5] [sec(s) I] “In this work, we propose a novel approach to improve vehicle localization using a GPS device. Specifically, we develop a reinforcement learning model to find an optimal policy that corrects GPS observations. The proposed work uses a learning process to find the optimal strategy to make corrections on GPS observations. To accelerate the training process and achieve better performance, a state-of-the-art parallel training architecture, namely, asynchronous advantage actor-critic (A3C) protocol, is implemented for learning the optimal correction policy. Due to lack of rigid assumptions on model parameters, the proposed framework is general and applicable to different GPS device and locations under nonstationary environments. Furthermore, because the model is capable of updating the optimal strategy as it collects more data, it can evolve over time, thereby enabling the vehicle to become an expert in localizing itself as it drives around, even in new environments.” [sec(s) Abs] “The proposed reinforcement learning model learns an optimal strategy to make “corrections” on raw GPS observations. The model uses an efficient confidence-based reward mechanism, which is independent of geolocation, thereby enabling the model to be generalized. We incorporate a map matching-based regularization term to reduce the variance of the reward return.” [sec(s) IV] “Before applying the RL and EKF methods to the trajectory data, we convert the long/lat coordinates to UTM (i.e, Cartesian) coordinates. To evaluate the performance of RL and EKF, we consider the prediction error for each GPS point as well as the accumulated error for the entire trajectory. Assume the ground truth for each point i (unknown to the GPS device/vehicle) to be (gxi, gyi), and its prediction result to be (lxi, lyi). The error for each prediction can be calculated as”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu, Kendall with the feedback of Zhang.
One of ordinary skill in the art would have been motived to combine in order to improve vehicle localization using a GPS device by developing a reinforcement learning model to find an optimal policy that corrects GPS observations.
(Zhang [sec(s) I] “In this work, we propose a novel approach to improve vehicle localization using a GPS device. Specifically, we develop a reinforcement learning model to find an optimal policy that corrects GPS observations.”)

Regarding claim 15
The claim is a system claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Claim(s) 8, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lu et al. (Hierarchical Reinforcement Learning for Autonomous Decision Making and Motion Planning of Intelligent Vehicles) in view of Kendall et al. (Learning to Drive in a Day) in view of Paul et al. (Study and influence of standardized driving cycles on the sizing of Li-ion Battery / Supercapacitor Hybrid Energy Storage)

Regarding claim 8
The combination of Lu, Kendall teaches claim 1.

However, the combination of Lu, Kendall does not appear to explicitly teach:
wherein the current driving cycle of the vehicle comprises one of:
a transient driving cycle; and
a modal driving cycle.

Paul teaches
wherein the current driving cycle of the vehicle comprises one of:
a transient driving cycle; and
a modal driving cycle.
(Paul [sec(s) II] “A driving cycle is a set of points representing the speed of a vehicle over time. There are many different driving cycles but they can be divided into two groups: transient driving and modal driving. Basically, modal driving is a series of linear acceleration, linear braking phases and constant speed phases which is not representative of the real behavior of a driver whereas transient driving includes a lot of speed variation, typical of real driving conditions. Many countries or organizations create their own driving cycle representative of their own roads and environment, in order to assess the performance of combustion vehicle [9]. Driving cycles from United States (FTP-75) and Europe (ARTEMIS, NEDC) will be presented. Also, the worldwide WLTP driving cycle will be introduced.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Lu, Kendall with the driving cycle of Paul.
One of ordinary skill in the art would have been motived to combine in order to improve battery lifetime based on supercapacitors that assist the battery in power.
(Paul [sec(s) V] “Even though, the weight of the HESS is quite close as the weight of a single source battery system, supercapacitors assist the battery in power and therefore improves battery lifetime. Moreover sizing and performance are directly dependent on other variables above mentioned such as quality of components or energy management.”)

Regarding claim 16
The claim is a system claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Draayer et al. (US 20200071144 A1) teaches throttle input, wheel angle, wheel speed, brake pedal, etc.
Qiao et al. (Hierarchical Reinforcement Learning Method for Autonomous Vehicle Behavior Planning) teaches Hierarchical RL Option and Action Q-Network.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Fri 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/SEHWAN KIM/Examiner, Art Unit 2129                                                                                                                                                                                                        
1/26/2026
Read full office action
Prosecution Timeline

Jun 20, 2023
Application Filed
Jan 26, 2026
Non-Final Rejection — §101, §103
Mar 24, 2026
Interview Requested
Apr 09, 2026
Examiner Interview Summary
Apr 09, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

15/360,454
Patent 12602595
SYSTEM AND METHOD OF USING A KNOWLEDGE REPRESENTATION FOR FEATURES IN A MACHINE LEARNING CLASSIFIER
2y 5m to grant Granted Apr 14, 2026
16/453,380
Patent 12602580
Dataset Dependent Low Rank Decomposition Of Neural Networks
2y 5m to grant Granted Apr 14, 2026
17/098,007
Patent 12602581
Systems and Methods for Out-of-Distribution Detection
2y 5m to grant Granted Apr 14, 2026
17/358,891
Patent 12602606
APPARATUSES, COMPUTER-IMPLEMENTED METHODS, AND COMPUTER PROGRAM PRODUCTS FOR IMPROVED GLOBAL QUBIT POSITIONING IN A QUANTUM COMPUTING ENVIRONMENT
2y 5m to grant Granted Apr 14, 2026
18/081,242
Patent 12541722
MACHINE LEARNING TECHNIQUES FOR VALIDATING AND MUTATING OUTPUTS FROM PREDICTIVE SYSTEMS
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+65.6%)
4y 1m
Median Time to Grant
Low
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allow rate.
SYSTEM AND METHOD FOR REINFORCEMENT LEARNING OF STEERING GEOMETRY

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email