Last updated: May 29, 2026
Application No. 17/453,055
MODEL-BASED REINFORCEMENT LEARNING FOR BEHAVIOR PREDICTION

Non-Final OA §102§103§112
Filed
Nov 01, 2021
Priority
Nov 01, 2020 — provisional 63/108,432
Examiner
MULLINAX, CLINT LEE
Art Unit
2123
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
3 (Non-Final)
Interview Optional

— +38.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 48% grant rate with +38.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 126 resolved cases, 2023–2026
Examiner Intelligence

MULLINAX, CLINT LEE View full profile →
Grants 48% of resolved cases
Career Allowance Rate
60 granted / 126 resolved
-7.4% vs TC avg
Strong +39% interview lift
Without
With
+38.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
12 currently pending
Career history
151
Total Applications
across all art units
Statute-Specific Performance

§101
6.3%
-33.7% vs TC avg
§103
85.8%
+45.8% vs TC avg
§102
4.8%
-35.2% vs TC avg
§112
1.9%
-38.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 126 resolved cases
Office Action

§102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 09/23/2025 has been entered.

Status of Claims
This action is a responsive to the application filed on 09/23/2025.
Claims 1-20 are pending.
Claims 1-9, 11-12, and 14-19 have been amended.

Response to Arguments
Applicant’s arguments, with respect to the rejection(s) of claim(s) 1-20 under 35 U.S.C. 101, have been fully considered and are persuasive. Therefore, the previous rejections set forth in the previous office action have been withdrawn.

Applicant’s arguments, with respect to the rejection(s) of claim(s) 1 under 35 U.S.C. 103, have been considered but they are not persuasive. Applicant argues that no reference teaches the amended limitations, since Chen’s “blue and red markings (‘alleged at least one first position’ and ‘alleged at least one second position’)” is not the same as the amended claim limitations; and further “the alleged DNN in Chen is not simulating the alleged second position from the alleged at least one first position, rather they are discrete inputs to the alleged DNN”. The examiner respectfully disagrees.
Chen has been found to teach the amendments due to the broadness of the claim language, since Chen teaches Figure 3, which teaches an output of the latent encoding step which takes in the top row and outputs a lower dimensional estimate of the data. As seen in section 5 and Figures 3-4, included is a prediction of at least one second position of the one or more actors across a second state of the environment as the routing data (blue line) represents future positions of the car. This is still considered generating predictions as it takes in the original routing data of current vehicle input state (first position) and outputs a new routing set based on the variational auto-encoder (MLM). [Page 3]. Furthermore, Chen teaches: “which encodes the original high dimensional observation o to a low dimensional latent state x” [Page 3, III B-C.] which describes that the input is the entirety of the observation (Routing, Detected Objects, Ego States/Position) and the output is a generated lower dimensional routing information. As the position is an input for this step, it reads on the limitations of “using the at least one first position of one of the actors” and the generated routing is future positions that the actor should follow which would be in a second state (future state) in a time step subsequent (routing information covers directly subsequent as the positions from the routing information are a direct connected line of positions). Routing information is considered predicted future positions (second position) of an actor across future time steps and when the actor is within that future position is also a future state of the environment (as the environment has changed since the position of the actor has changed.) utilizing the RL agent policy).
Further, Chen teaches the “Soft Actor Critic (SAC)” which is taught to be a last step of the RL Agent [right-side of Page 5]. Chen teaches on Section 4.C, 5, and Fig. 4 the soft value function which makes up the SAC which is part of the system to maximize the expected awards from the RL Agent using the parameters from the Latent Encoding Step (including the routing with the simulated second position) and outputs the expected reward (scores) from undergoing a predicted policy based on the current input state of the vehicle and desired goal point of the vehicle executed in a simulating environment.
See 35 U.S.C 103 section for full mapping of claim limitations necessitated by applicant amendments.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 3 and 18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 3 and 18 recite the limitation “the ego-machine” with insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 102
10. 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
11.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

12.	Claims 1, 2-4, 6, 7, 9, and 16 - 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen’s Model-free Deep Reinforcement Learning for Urban Autonomous Driving.

Regarding Claim 1, Chen teaches: A method comprising:
determining, using at least partially simulated data, at least one first position of one or more actors in a first state of an environment; (Chen teaches Fig. 2 which shows a simulated data which includes a determined first position of an ego vehicle (Historical ego states)  based on a first state of an environment generated by sensor data Chen teaches: “To alleviate this problem, we proposed to convert the output of perception module (object detection and localization), as well as the routing information, to a bird-view representation that applied as the input to our policy.” [Page 2] which teaches it takes in first sensor data of an environment to construct the Bird-View image which have the predicted ego-position.)
    PNG
    media_image1.png
    424
    599
    media_image1.png
    Greyscale

applying the at least one first position to a machine learning model (MLM) to cause the MLM to simulate, from the at least one first position, at least one second position of the one or more actors in a second state of the environment that the MLM forward simulates from at least the first state of the environment; (Chen teaches Figure 3, which teaches an output of the latent encoding step which takes in the top row and outputs a lower dimensional estimate of the data. As seen in section 5 and Figures 3-4, included is a prediction of at least one second position of the one or more actors across a second state of the environment as the routing data (blue line) represents future positions of the car. This is still considered generating predictions as it takes in the original routing data of current vehicle input state (first position) and outputs a new routing set based on the variational auto-encoder (MLM). [Page 3]. Furthermore, Chen teaches: “which encodes the original high dimensional observation o to a low dimensional latent state x” [Page 3, III B-C.] which describes that the input is the entirety of the observation (Routing, Detected Objects, Ego States/Position) and the output is a generated lower dimensional routing information. As the position is an input for this step, it reads on the limitations of “using the at least one first position of one of the actors” and the generated routing is future positions that the actor should follow which would be in a second state (future state) in a time step subsequent (routing information covers directly subsequent as the positions from the routing information are a direct connected line of positions). Routing information is considered predicted future positions (second position) of an actor across future time steps and when the actor is within that future position is also a future state of the environment (as the environment has changed since the position of the actor has changed.) utilizing the RL agent policy)

    PNG
    media_image2.png
    261
    604
    media_image2.png
    Greyscale

applying the at least one simulated second position to at least one machine learning model (MLM) to generate, based at least partially on the at least one simulated second position, a predicted action corresponding to one or more actions for an ego-vehicle to take with respect to the first state of the environment and the one or more actors; (Chen teaches Figure 1 which takes the low dimensional second data which corresponds to the updated predictions based on the state of the environment into an RL Agent. The RL Agent is understood to be an MLM, as Chen teaches on Page 5, “Now we introduce the remaining layers of the networks used in the three reinforcement learning algorithms as follows:” which teach a machine learning model taught by reinforcement learning. Figure 1 teaches the RL agent (MLM) being used to generate “control commands” which are action commands of the vehicle moving from a current state to a goal state. Furthermore, Chen teaches: current “encoded states as input (first position) and generates the control command such as acceleration and steering angle.” [Page 2] which helps to clarify that the predicted control commands relate to actions for an ego-vehicle which is in the first state of the environment (the red historical ego state shows the vehicle in its current state of the environment and the control commands are based on the blue pathing for the future steps (simulated second position) of the environment.))

    PNG
    media_image3.png
    503
    670
    media_image3.png
    Greyscale

Latent Encoding represents the DNN step and the RL Agent is the MLM step.
assigning to the predicted action, one or more outputs corresponding to one or more scores from a value function that is evaluated using the at least one simulated second position; (Chen teaches the “Soft Actor Critic (SAC)” which is taught to be a last step of the RL Agent [right-side of Page 5]. Chen teaches on Section 4.C, 5, and Fig. 4 the soft value function which makes up the SAC which is part of the system to maximize the expected awards from the RL Agent using the parameters from the Latent Encoding Step (including the routing with the simulated second position) and outputs the expected reward (scores) from undergoing a predicted policy based on the current input state of the vehicle and desired goal point of the vehicle executed in a simulating environment.)
 and updating one or more parameters of the at least one MLM based at least on the one or more outputs. (Chen, Pg. 2768, Equation 12 and paragraph found afterwards details how the system which is the value function MLM is updated by supervised learning based on the outputted policy network.).
	
Regarding Claim 2, Chen teaches all of the limitations of Claim 1, and Chen further teaches: The method of claim 1, wherein the at least one MLM decodes at least a portion of a latent space of the MLM to generate the predicted action corresponding to the one or more actions. (Chen, Pg. 3, “VAE is composed of two parts, an encoding network q- (xj o) which encodes the original high dimensional observation o to a low dimensional latent state x, and a decoding network p(oj x) which decodes x to o.”. This quotation demonstrates a decoder that is part of the MLM within which data is fed through to the RL agent which decodes the low dimensional state x, which includes the low-dimensional bird-eye images used to generate predicted actions in the RL policy step of Chen.)

Regarding Claim 3, Chen teaches all of the limitations of claim 1, and further teaches 
wherein the determining the at least one first position of the one or more actors includes: forward simulating, by the MLM, one or more first positions of the one or more actors in the first state of the environment from one or more second positions of the one or more actors in one or more prior states of the environment and adjusting the one or more first positions in the first state to the at least one first position in the first state based at least on controlling of an actor to move towards the ego-machine (Chen, Pg. 2, “Reinforcement learning is then applied to learn a policy to generate the correct control command”. Chen teaches Figures 1, 3-4, and section 5 which show the RL agent fed data by the Latent Encoding step which includes the VAE (paired encoder and decoder) and simulating vehicle current state inputs, goal point, and intermediate position states in the environment in a RL policy computation. Therefore, the decoder of the VAE which decides the latent space of the Latent Encoding (MLM) is then sent into the RL Agent of Figure 1 which applies reinforcement learning. The RL Agent is taught to be trained by reinforcement learning on page 5.

Regarding Claim 4, Chen teaches all the limitations of Claim 1, and Chen further teaches: The method of claim 1, wherein the predicted action is further generated based at least on one or more goals of the one or more actions for the ego-vehicle. (Chen, Pg. 3, “VAE is composed of two parts, an encoding network q- (xj o) which encodes the original high dimensional observation o to a low dimensional latent state x, and a decoding network p(oj x) which decodes x to o.”. This quotation demonstrates an encoder that is part of the MLM within which data is fed through to the RL agent which includes routing information (goals) which is included the bird-eye images used to generate predicted actions in the RL policy step of Chen.)

Regarding Claim 6, Chen teaches all the limitations of Claim 1, and Chen further teaches: The method of claim 1, wherein the predicted action is generated using an actor network of the at least one MLM and at least one of the one or more parameters are of a critic network corresponding to the at least one MLM. (Chen, Pg. 5, “Soft Actor Critic (SAC): The soft actor critic has two Q networks, one value network, and a policy network.” The policy network, which represents the MLM detailed in Claim 1, (as it’s a network that chooses actions to take), is made up of the tied actor and critic system. This MLM is also used to generate policies which are the predicted actions for the vehicle to take.)

Regarding Claim 7, Chen teaches all the limitations of Claim 1, and Chen further teaches: The method of claim 1, wherein the MLM is further to simulate at least one third position of the ego-vehicle in the second state from a position of the ego-vehicle in the first state, (Chen teaches Figs. 3-4 and section 5 which shows generating the low dimensional routing information (blue line), which involve predicted future goal positions of the actor in the future states of the environment from the input current state. Simulating when the vehicle is on the second state and follows the routing to enter the third position based on the routing information, this is understood to be a predicted third state that follows the second position in the second state.)
 and the value function is further evaluated using the at least one third position of the ego-vehicle. (Chen teaches the SAC system, which involves the target value function which takes in routing information from the encoding step (as seen by Figs.1, 4, and section 5 of Chen). The routing information is understood to contain multiple future positions of the actor, as it is the path that the actor will take.)


Regarding Claim 9, Chen teaches all the limitations of Claim 1, and Chen further teaches: The method of wherein the one or more actors include at least two actors.  (Chen, Pg. 5, “term penalizing collision with other surrounding vehicles” which represents the system taking account of other actors in the simulation. In particular, other vehicles).

Regarding Claim 16, Chen teaches A system comprising: one or more processing units to perform one or more control actions corresponding to a machine based at least on one or more actions predicted for the machine using at least one machine learning model (MLM), the at least one MLM trained based at least on: (While Chen does not outright mention a processor, it’s well understood in the art that to run a neural network system, a processor is required. Furthermore, Chen mentions on Pg. 6, a limitation of the research was the computation sources, which defines a computational system that uses a processor. Page 3 and Figs. 1, and 3-4 teach VAE encoding current input states of a vehicle and employing reinforcement learning agent policies for processing intermediate states, goal states, and rewards)  
 receiving sensor data obtained using one or more sensors of an ego-vehicle within an environment (Chen, Pg. 2, Input Representation section, in which the perception module is taught to receive sensor data from a vehicle (Chen directly mentions “a front view image”) ); based at least on the sensor data, determining at least one first position of one or more actors in a first state of an environment; (Chen teaches on Page 2 and Page 3 the perception module in which Chen teaches: “we proposed to convert the output of perception module (object detection and localization), as well as the routing information, to a bird-view representation that applied as the input to our policy” as well as Figure 2 which shows “Historical Ego States” which contain the first position of the vehicle in the first state of the environment.)
applying the at least one first position to a machine learning model (MLM) to cause the MLM to simulate, from the at least one first position, at least one second position of the one or more actors in a second state of the environment that the MLM forward simulates from at least the first state of the environment; (Chen teaches Figure 3, which teaches an output of the latent encoding step which takes in the top row (first data) and outputs a lower dimensional estimate of the data. As seen in Figure 3, included is a prediction of at least one second position of the one or more actors across a second state of the environment as the routing data (blue line) represents future positions of the car. This is still considered generating predictions as it takes in the original Chen teaches Figure 3, which teaches an output of the latent encoding step which takes in the top row and outputs a lower dimensional estimate of the data. As seen in Figure 3, included is a prediction of at least one second position of the one or more actors across a second state of the environment as the routing data (blue line) represents future positions of the car. This is still considered generating predictions as it takes in the original routing data of current vehicle input state (first position) and outputs a new routing set based on the variational auto-encoder (MLM). [Page 3]. Furthermore, Chen teaches: “which encodes the original high dimensional observation o to a low dimensional latent state x” [Page 3, III B-C.] which describes that the input is the entirety of the observation (Routing, Detected Objects, Ego States/Position) and the output is a generated lower dimensional routing information. As the position is an input for this step, it reads on the limitations of “using the at least one first position of one of the actors” and the generated routing is future positions that the actor should follow which would be in a second state (future state) in a time step subsequent (routing information covers directly subsequent as the positions from the routing information are a direct connected line of positions). Routing information is considered predicted future positions (second position) of an actor across future time steps and when the actor is within that future position is also a future state of the environment (as the environment has changed since the position of the actor has changed.) utilizing the RL agent policy)

    PNG
    media_image2.png
    261
    604
    media_image2.png
    Greyscale

applying the at least one simulated second position to at least one machine learning model (MLM) to generate, based at least on the at least one simulated second position, a prediction action for an ego-vehicle to take with respect to the first state of the environment and the one or more actors; (Chen teaches Figure 1 which takes the low dimensional second data which corresponds to the updated predictions based on the state of the environment into an RL Agent. The RL Agent is understood to be an MLM, as Chen teaches on Page 5, “Now we introduce the remaining layers of the networks used in the three reinforcement learning algorithms as follows:” which teach a machine learning model taught by reinforcement learning. Figure 1 teaches the RL agent (MLM) being used to generate “control commands” which are action commands of the vehicle moving from a current state to a goal state. Furthermore, Chen teaches: current “encoded states as input (first position) and generates the control command such as acceleration and steering angle.” [Page 2] which helps to clarify that the predicted control commands relate to actions for an ego-vehicle which is in the first state of the environment (the red historical ego state shows the vehicle in its current state of the environment and the control commands are based on the blue pathing for the future steps (simulated second position) of the environment.))
 assigning, to the predicted action, one or more outputs corresponding to one or more scores from a value function that is evaluated using the at least one simulated second position (Chen teaches the “Soft Actor Critic (SAC)” which is taught to be a last step of the RL Agent [right-side of Page 5]. Chen teaches on Section 4.C and Fig. 4 the soft value function which makes up the SAC which is part of the system to maximize the expected awards from the RL Agent using the parameters from the Latent Encoding Step (including the routing with the simulated second position) and outputs the expected reward (scores) from undergoing a predicted policy based on the current input state of the vehicle and desired goal point of the vehicle); and 
updating one or more parameters of the at least one MLM based at least on the one or more outputs (Chen, Pg. 2768, Equation 12 and paragraph found afterwards details how the system which is the value function MLM is updated by supervised learning based on the outputted policy network.)

Regarding Claim 17, Chen teaches all the limitations of Claim 16, and Chen further teaches: The system of claim 16, wherein the value function includes a state value function (Chen, Pg. 4 Section IV.C equation 12, which takes in a state at a certain time interval), and one or more states of the value functions correspond to one or more times (Chen, Pg. 2768 Section IV.C equation 12, the state is based on a certain time), and one or more positions of the at least one second position in a latent space of the MLM. (Chen, Pg. 2 , Fig 1, this is a step done in the RL Agent, which is inputted the second position from the routing information of the Latent Encoding MLM.)

Regarding Claim 18, Chen teaches all the limitations of Claim 16, and Chen further teaches: The system of claim 16, wherein the at least one MLM generates the prediction action using one or more encoded goals input to the at least one MILM, the one or more encoded goals representing a destination for the ego-machine. (Chen, Pg. 2, Figs 1 and 4, where the control command action is an output that is generated by inputting the goal routing information (bird-view Image) that’s encoded (by the VAE) in the Latent Encoding step).

Regarding Claim 19, Chen teaches all the limitations of Claim 16, and Chen further teaches: The system of claim 16, wherein the at least one MLM decodes at least a portion of a latent space of the MLM to generate the predicted action. (Chen, Pg. 2, Figs. 1 and 4, shows the MLM which contains the VAE as mentioned in claim analysis for Claim 2 for decoding the latent space of the MLM and is used directly for generating the predicted policy decisions of actions and associated rewards).

Regarding Claim 20, Chen teaches all the limitations of Claim 16, and Chen further teaches: The system of claim 16, wherein the system is comprised in at least one of: 
a control system implemented using an autonomous or semi-autonomous machine (Chen, Pg. 2, Fig. 1 directly discloses a system for an Autonomous Vehicle which sends out control commands);
 a perception system implemented using an autonomous or semi-autonomous machine (Chen, Pg. 2, Fig. 1 directly discloses a Perception & Routing System for an Autonomous Vehicle); 
a system using the one or more processing units for performing simulation operations (Chen, Pg. 2, Fig. 1 shows it being used with the CARLA simulator.); 
a system using the one or more processing units for performing deep learning operations; (Chen teaches the DNN of Figure 1 which uses the processing device as described in Claim 16.)
a system implemented using an edge device;
 a system implemented using a robot; (Chen teaches the system being used to control a robot (a vehicle with sensors), see Figure 1 and section 3A.)
a system including the one or more processing units implementing one or more virtual machines (VMs);
 a system implemented at least partially in a data center; 
or a system implemented at least partially using cloud computing resources.

Claim Rejections - 35 USC § 103
13. 	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
14.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

15.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 11, 12, 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen’s Model-free Deep Reinforcement Learning for Urban Autonomous Driving and further in view of Zyner’s Naturalistic Driver Intention and Path Prediction Using Recurrent Neural Networks.

	Regarding Claim 11, Chen teaches: At least one processor comprising one or more circuits to (While Chen does not outright mention a processor, it’s well understood in the art that to run a neural network system, a processor is required. Furthermore, Chen mentions on Pg. 6, a limitation of the research was the computation sources, which defines a computational system that uses a processor.) use a machine learning model (MLM) as a world model to forward simulate, from a first state of an environment, at least one second state of the environment (Chen, section 5, Figs.3-4, and Pg. 2, Fig. 1, where the Latent Encoding step is a MLM that takes in a model of the world (bird-view image) from a driving environment and simulates the vehicle moving from a current states to a goal state with intermediate states between) 
apply the forward simulated at least one second state to at least one MLM to generate a prediction of one or more actions for an ego-machine to take with respect to the first state and one or more actors in the environment (Chen, Pg. 2, “Reinforcement learning is then applied to learn a policy to generate the correct control command (action)” which is directed to the Intelligent Driving Agent MLM. Further section 5 and Figs. 3-4 teach simulating the RL agent policy for driving the car to the goal point state while avoiding other cars); 
convert, one or more positions of the one or more actors in the forward simulated at least one second state to a metric world space; and apply reinforcement learning to the at least one MLM based at least on an evaluation, using a physics engine simulator, of the prediction based at least on the one or more positions of the one or more actors in the metric world space. (Chen, Pg. 2, “Reinforcement learning is then applied to learn a policy to generate the correct control command (action)” which is directed to the Intelligent Driving Agent MLM. Further section 5 and Figs. 3-4 teach simulating the RL agent policy for driving the car form the current state to the goal point state while avoiding other vehicles in the simulated environment.).

Chen at least implies:
forward simulate, from a first state of an environment, at least one second state of the environment 
However, Zyner teaches limitation. Zyner figure 3 is provided below, in which a DNN (the RNN coupled with the other two algorithms) takes in a first state of an environment (T0 with initial vehicle state x0) and outputs a predicted second position (y1) in the secondary environmental state (T+1).

    PNG
    media_image4.png
    620
    1085
    media_image4.png
    Greyscale


Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to combine the low dimensional encoding step of Chen with the state prediction step by Zyner for the improvement of allowing the model to better represent the intentions of a vehicle. This improvement is taught in Section 1 “In an intersection, there may be multiple paths to take, but the average of two solutions may not be a valid solution. Therefore proposing multiple paths a vehicle may take and ranking them allows for a better representation of a vehicle’s predicted intention.”.

Regarding Claim 12, Chen as modified by Zyner teaches all the limitations of Claim 11, and Chen further teaches: The at least one processor of Claim 11, wherein the at least one second state comprises a plurality of forward simulated states, the MLM receives input comprising a spatial representation of the first state of the environment, and the MLM encodes the spatial representation of the first state into a plurality of latent space spatial representations that each correspond to a respective state of the plurality of forward simulated states at a corresponding time slice. (Chen, Pg. 2, “Reinforcement learning is then applied to learn a policy to generate the correct control command”. Chen teaches Figures 1, 4, and section 5 which show the RL agent fed data by the Latent Encoding step which includes the VAE (paired encoder and decoder) and simulating vehicle current state inputs, goal point, and intermediate position states in the environment in a RL policy computation. Therefore, the decoder of the VAE which decides the latent space of the Latent Encoding (DNN) is then sent into the RL Agent of Figure 1 which applies reinforcement learning. The RL Agent is taught to be trained by reinforcement learning on page 5.)

Regarding Claim 15, Chen as modified by Zyner teaches all the limitations of Claim 11, and Chen further teaches: The at least one processor of claim 11, wherein the reinforcement learning is applied using a value function (Chen, Pg. 3, “The most representative model-free reinforcement learning is Q learning [31], which is based on an estimation of the Q value Q (s; a), defined as the expected future total rewards when taking action an at state s and then follow the policy”), and the evaluation includes generating one or more scores of the value function based at least on the one or more positions of the one or more actors in the metric world space. (Chen, Fig. 4, section 5, and Pg. 3, Observe the quotation under equation (3), in which the Optimal Policy is what is outputted, which is what the RL agent generates as its predictions and corresponding rewards for moving the vehicle towards the goal state position).

Claim(s) 5 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen’s Model-free Deep Reinforcement Learning for Urban Autonomous Driving and of Kabria’s Deep Imitation Learning for Autonomous Vehicles Based on Convolutional Neural Networks.

Regarding Claim 5, Chen teaches all of the limitations of claim 1, however Chen does not teach: The method of claim 1, wherein the MLM comprises a deep neural network (DNN) trained using imitation learning of real-world input states and resultant real-world output states.
Nevertheless, Kebria teaches wherein the MLM comprises a deep neural network (DNN) trained using imitation learning of real-world input states and resultant real-world output states. This is shown on Page 85, “After data collection [from cameras on the car (real-world input states)], next step is to utilize the data for training the CNN models. CNN parameters are being updated by gradient descent algorithms. The models learn the control policy by imitation, similar to the original idea of ‘pixel to action’”, in order “to predict the best steering angle (resultant real-world output states)”. A CNN Model is a specialized version of a DNN, therefore it serves to fulfil the limitation of the claim.

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to modify the reinforcement learning model from Chen to switch to imitation learning for the improvement of more efficient computations and lowering the amount of expert knowledge required for training. This is taught by Kebria: “imitation learning is easier and more efficient in terms of computations and the amount of expert knowledge required for training process” (Kebria, Page 82). 

Regarding Claim 13, Chen teaches all of the limitations of claim 11, however Chen does not teach: The at least one processor of claim 11, wherein the MLM comprises a deep neural network (DNN) trained using imitation learning of real-world input states and resultant real-world output states.
Nevertheless, Kebria teaches wherein the MLM comprises a deep neural network (DNN) trained using imitation learning of real-world input states and resultant real-world output states. This is shown on Page 85, “After data collection [from cameras on the car (real-world input states)], next step is to utilize the data for training the CNN models. CNN parameters are being updated by gradient descent algorithms. The models learn the control policy by imitation, similar to the original idea of ‘pixel to action’”, in order “to predict the best steering angle (resultant real-world output states)”. A CNN Model is a specialized version of a DNN, therefore it serves to fulfil the limitation of the claim.

Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to modify the reinforcement learning model from Chen to switch to imitation learning for the improvement of more efficient computations and lowering the amount of expert knowledge required for training. This is taught by Kebria: “imitation learning is easier and more efficient in terms of computations and the amount of expert knowledge required for training process” (Kebria, Page 82). 

Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen’s Model-free Deep Reinforcement Learning for Urban Autonomous Driving and in view of Gupta (U.S. Patent No US10196086B2)

Regarding Claim 8, Chen teaches all of the limitations of claim 1, and Chen does teach most of the section of the claim: The method of claim 1, wherein the at least one second position of the one or more actors corresponds to a trajectory of an actor (Chen Fig. 3 shows the routing information (blue line) which show the future positions of the actor, including the second position, and the future trajectory/path of the actor.)

Chen does not distinctly disclose:
and the method includes extending the trajectory using a mechanical motion algorithm of a physics engine to generate an extended trajectory, wherein the one or more outputs correspond to the extended trajectory.

However, Gupta teaches this limitation: “The vehicle trajectory calculation module 114 (mechanical motion algorithm of a physics engine) is suitably configured to compute a “final” steering command appropriate to maneuver the vehicle 100 toward, and through, the theoretical path (i.e., the desired trajectory) in use by the vehicle 100. The vehicle trajectory calculation module 114 uses computed parameters, vehicle sensor data, lateral offset error data, and steer angle data, to compute.” (Gupta, Pg. 7, Column 5).  The method of using variables to compute a new desired trajectory is understood to be a mechanical motion algorithm. Therefore, the system creates an updated new trajectory using an algorithm where the output is the new trajectory.

Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the routing system from Chen to also allow further trajectory prediction as taught by Gupta to allow the vehicle to operate without access to routing information. Furthermore “feedforward rear steer angle and the feedback signal; and operates a steering mechanism of the vehicle using the final steer angle command, to autonomously maneuver the vehicle according to the final steer angle command.” (Gupta, Pg.5, Column 1) which supports the idea that adding in obvious change will allow the vehicle to operate even without imputed routing information. Therefore, one of ordinary skill in the art would be motivated to make such a combination based on the motivation found in Gupta.

Claim(s) 10 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Chen’s Model-free Deep Reinforcement Learning for Urban Autonomous Driving and finally in view of Annell’s Probabilistic Collision Estimation System for Autonomous Vehicles

	Regarding Claim 10, Chen teaches all of the limitations of claim 1, and Chen further teaches: The method of claim 1, comprising: determining, using the second state of the environment, (Chen teaches generating a second state of the environment, see the Latent Encoding step of Fig. 1)

	Chen does not distinctly disclose:
a likelihood of a collision between the ego-vehicle and another object in the environment; and computing the one or more outputs based at least on the likelihood of the collision.

	Annell teaches this limitation, as seen by Fig 2 on Page 474, where the ego vehicle (in green) and another object (observed vehicle in orange) in the environment are detected and a probability distribution of the collision chance is computed, which represents the likelihood of a collision between the two entities. Furthermore, Annell teaches Page 473: “The two outputs are a Probability Field that could be used by a trajectory planner and a Collision Probability which could be used in a higher-level part of the system or as a controller” which teaches computing the output and using it for the purposes of a trajectory planner.
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the routing information step from Chen to also take in input based on a trajectory calculation as taught by Annell for the improvement of predicting collisions more accurately and give the vehicle more situational awareness. This improvement is taught by Annell: “The system functionality and the utility gained from this kind of predictions are very interesting to give the autonomous vehicles a higher level of situational awareness.” (Page 6, Paragraph 2)

	Regarding Claim 14, Chen teaches all of the limitations of claim 1, but does not teach: The at least one processor of claim 11, wherein the at least one MLM is trained to generate a predicted trajectory for a vehicle.

	However, Annell teaches the limitation:
	Annell teaches Figure 1, which describes the proposed system which is understood to be a machine learning model. Figure 2 teaches a trajectory for an orange vehicle, which is a prediction of a trajectory of a vehicle that is not the ego-machine. 

	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the trained MLM from Chen to also train for the purposes of trajectory calculation for other vehicles as taught by Annell for the improvement of predicting collisions more accurately and give the vehicle more situational awareness. This improvement is taught by Annell: “The system functionality and the utility gained from this kind of predictions are very interesting to give the autonomous vehicles a higher level of situational awareness.” (Page 6, Paragraph 2)

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hofmann et al (US Pub 20220114474) teaches utilizing machine learning and reinforcement learning for autonomous vehicle driving. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
Prosecution Timeline

Show 3 earlier events
Apr 14, 2025
Examiner Interview Summary
Apr 14, 2025
Applicant Interview (Telephonic)
Apr 15, 2025
Response Filed
Jun 24, 2025
Final Rejection mailed — §102, §103, §112
Sep 16, 2025
Interview Requested
Sep 23, 2025
Request for Continued Examination
Sep 30, 2025
Response after Non-Final Action
Apr 08, 2026
Non-Final Rejection mailed — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/005,750
Patent 12619424
ROBOTIC SCRIPT GENERATION BASED ON PROCESS VARIATION DETECTION
7y 10m to grant Granted May 05, 2026
18/380,620
Patent 12613706
HARDWARE ACCELERATED MACHINE LEARNING
2y 6m to grant Granted Apr 28, 2026
17/089,974
Patent 12608639
SYSTEM AND METHOD FOR PREDICTIVE VOLUMETRIC AND STRUCTURAL EVALUATION OF STORAGE TANKS
5y 5m to grant Granted Apr 21, 2026
18/375,973
Patent 12561620
Machine Learning-Based URL Categorization System With Noise Elimination
2y 4m to grant Granted Feb 24, 2026
16/726,709
Patent 12554962
CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS
6y 1m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
48%
Grant Probability
86%
With Interview (+38.7%)
4y 7m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 126 resolved cases by this examiner. Grant probability derived from career allowance rate.