Last updated: May 29, 2026
Application No. 18/531,510
AUTONOMOUS VEHICLE TRAJECTORY PLANNING USING NEURAL NETWORK TRAINED BASED ON KNOWLEDGE DISTILLATION

Final Rejection §103
Filed
Dec 06, 2023
Priority
Jul 20, 2023 — provisional 63/514,764
Examiner
BRADY III, PATRICK MICHAEL
Art Unit
3666
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Honda Motor Co. Ltd.
OA Round
2 (Final)
This examiner grants 55% of cases after interview

— +41.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 121 resolved cases, 2023–2026
Examiner Intelligence

BRADY III, PATRICK MICHAEL View full profile →
Grants 55% of resolved cases
Career Allowance Rate
67 granted / 121 resolved
+3.4% vs TC avg
Strong +41% interview lift
Without
With
+41.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
27 currently pending
Career history
158
Total Applications
across all art units
Statute-Specific Performance

§101
0.3%
-39.7% vs TC avg
§103
97.0%
+57.0% vs TC avg
§102
0.5%
-39.5% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 121 resolved cases
Office Action

§103
DETAILED ACTION
This final action is in reply to the response filed 22 January 2026, which was in reply to the non-final action dated 22 October 2025.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
Claims 1-20 are pending.  Claims 1, 12 and 20 have been amended.
With regard to the drawing objections (pgs. 2-3, Action),  the examiner reviewed the replacement sheet, filed 22 January 2026, and found it acceptable.  Accordingly, the drawing objection is withdrawn.
With regard to the 35 U.S.C. 103 rejections (pgs. 3-27), the applicant’s amendments necessitated additional searching and consideration of new grounds of rejection.  Accordingly, the new grounds of rejection under 35 U.S.C. 103 are:  claims 1-20 in view of Seff, Rafaat and Ng, as discussed below.

Drawings
The drawings, filed 23 December 2023, and the replacement to Fig. 2, filed 22 January 2026, are accepted by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Publication Number 2024/0300542 to Seff et al. (hereafter Seff) in view of U.S. Patent Publication Number 2021/0133582 to Refaat et al. (hereafter Refaat) and U.S. Patent Publication Number 2024/0059285 to Ng et al. (hereafter Ng).
As per claim 1, Seff discloses [a]n electronic device (see at least Seff, abstract), comprising:
circuitry (see at least Seff, [0112] disclosing that embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them; [0113] disclosing that the apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them) configured to: 
determine a set of updated values of a set of variables of an objective function for trajectory planning of an ego autonomous vehicle (AV) (see at least Seff, [0043] disclosing that the training engine 142 trains the trajectory prediction machine learning models for the trajectory prediction system 114 to update model parameters 128 by optimizing an objective function based on ground truth trajectories for each agent, e.g., an objective function that measures likelihoods of the ground truth trajectories according to the trajectory prediction machine learning models, as described in more detail below with reference to FIG. 2), ... (1) ...  ;
 apply a first prediction network on an updated value of the set of updated values, a state of the ego AV over a past time interval, and a state of each AV of a set of AVs over the past time interval (see at least Seff, [0043] disclosing that the training engine 142 trains the trajectory prediction machine learning models for the trajectory prediction system 114 to update model parameters 128 by optimizing an objective function based on ground truth trajectories for each agent, e.g., an objective function that measures likelihoods of the ground truth trajectories according to the trajectory prediction machine learning models, as described in more detail below with reference to FIG. 2; [0051] disclosing that the trajectory prediction neural network 202 can process scene context data 106 to generate predicted future trajectories for each of one or more target agents of the plurality of agents in the environment. A predicted future trajectory for a given target agent specifies predicted states for the target agent (e.g., locations, headings, velocities, accelerations, etc., for the target agent) at one or more future time points. In particular, the trajectory prediction neural network 202 can predict joint future trajectories for the target agents. A predicted joint future trajectory for the target agents specifies predicted states for each of the target agents at one or more future time points; [0052] disclosing that the trajectory prediction neural network 202 can generate sequences of discrete motion tokens 204 that represent the predicted future trajectories. Each sequence of discrete motion tokens 204 can include, for each of the target agents and at each of a plurality of time points, a respective discrete motion token defining a predicted agent state for the target agent at the time points for the motion token; [0061]),
wherein the updated value is indicative of a current state of the ego AV (see at least Seff, Abstract; [0043] disclosing that the training engine 142 trains the trajectory prediction machine learning models for the trajectory prediction system 114 to update model parameters 128 by optimizing an objective function based on ground truth trajectories for each agent, e.g., an objective function that measures likelihoods of the ground truth trajectories according to the trajectory prediction machine learning models, as described in more detail below with reference to FIG. 2 [0051]; [0052]); ... (2) ... ; 
determine, based on the application, an output that includes a state of the ego AV over a future time interval, and a state of each AV of the set of AVs over the future time interval (see at least Seff, [0074] disclosing that the system can control an autonomous vehicle based on the respective predicted future trajectories of the plurality of agents (step 308). For example, the system can process the predicted future trajectories using a planning system to make fully-autonomous or partly-autonomous driving decisions. As a further example, the system can generate a fully-autonomous plan to navigate the autonomous vehicle that makes any of a variety of changes to the future trajectory of the autonomous vehicle. The changes may, for example, include turning the vehicle, accelerating the vehicle, decelerating the vehicle, etc. <interpreted as the state of each AV> The system can plan the changes to the future trajectory of the autonomous vehicle to accomplish a variety of tasks, such as avoiding a collision with other vehicles, navigating to a destination);
determine a set of optimal values for the set of variables based on the updated value and the determined output satisfying a safety constraint associated with the objective function (see at least Seff, [0037] disclosing that when the planning system 116 receives the trajectory prediction output 108, the planning system 116 can use the trajectory prediction output 108 to make fully-autonomous or partly-autonomous driving decisions. For example, the planning system 116 can generate a fully-autonomous plan to navigate the vehicle 102 to avoid a collision with another agent by changing the future trajectory of the vehicle 102 to avoid the predicted future trajectory of the agent); and
control a trajectory of the ego AV based on the set of optimal values of the set of variables (see at least Seff, [0074]).  But, Seff does not explicitly teach the following limitation taught in Refaat:
(1) wherein the determination is based on a set of initial values of the set of variables and a set of gradients of the objective function (see at least Refaat, [0005] disclosing determining an update to the current values of the parameters of the first and second sub neural networks includes: determining, based on computing a gradient of the first loss with respect to the second sub neural network parameters, an update to the current values of the parameters of the second sub neural network; and backpropagating the computed gradient of the first loss through the second sub neural network into the first sub neural network to determine the update to the parameter values of the first sub neural network; [0081]).  But, neither, Seff nor Refaat explicitly teach the following limitation taught in Ng:
(2) wherein the output is determined in a single inference at a current time instance and during an elapse of the future time interval, based on the application of the first prediction network on the updated value (see at least Ng, [0058] disclosing that the post-processor 330 may use the confidence field(s) 326 and the vector field(s) 328 <interpreted as output determined in a single instance> to determine the future path(s) 104 for the vehicle and/or the future path(s) for the object(s). For example, the confidence field 326 corresponding to a last future time slice (e.g., T.sub.n) of the outputs 310 may be analyzed by the post-processor 330 to determine locations of objects, and the corresponding vectors from the vector field 328 at the same time slice may be leveraged to determine predicted locations of the objects in a confidence field 326 from a preceding time slice (e.g., T.sub.n−1). The confidence field 326 from the preceding time slice may then be used to determine the locations of the objects at that time slice (e.g., T.sub.n−1), and then the vector field 328 from that time slice may be used to determine predicted locations of the objects in a confidence field 326 from a preceding time slice (e.g., T.sub.n−2), and so on, until a current time is reached; [0066]; [0092]; [0104] disclosing that with reference to FIGS. 3A-3B, in order to train the neural network 116, a training engine 334 may be employed. The training engine 334 may rely on ground truth data and one or more loss functions to update weights and parameters of the neural network(s) 116; [0125]; [0126]; [0156] disclosing that the DLA may be used to run any type of network to enhance control and driving safety, including for example, a neural network that outputs a measure of confidence for each object detection. Such a confidence value may be interpreted as a probability, or as providing a relative “weight” of each detection compared to other detections. This confidence value enables the system to make further decisions regarding which detections should be considered as true positive detections rather than false positive detections <interpreted as output>) ... .
Seff, Rafaat and Ng are analogous art to claim 1 because they are in the same field of autonomous vehicle trajectory planning.  Seff relates to predicting the future trajectory of an agent in an environment (see at least Seff, [0002]).  Refaat relates to methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network having a plurality of sub neural networks to assign respective confidence scores to one or more candidate future trajectories for an agent (see Refaat, Abstract). Ng relates to an ego vehicle applying a machine learning model to one or more agent tensors and one or more map tensors associated with a target vehicle to obtain a predicted drive intention of the target vehicle at a roadway intersection (see at least Ng, Abstract).
Therefore, it would have been prima facie obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the device, as disclosed in Seff, to provide the benefit of having (1) the determination be based on a set of initial values of the set of variables and a set of gradients of the objective function, as disclosed in Refaat, with a reasonable expectation of success. Doing so would provide the benefit of improving prediction accuracy and training efficacy (see at least Refaat, [0017]), with an expectation of success.  It would have been further obvious to one of someone of ordinary skill in the art before the effective filing date of the claim invention to have modified the device, as disclosed in Seff, as modified by Refaat, to provide the benefit of (2) having the output be determined in a single inference at a current time instance and during an elapse of the future time interval, based on the application of the first prediction network on the updated value, as disclosed in Ng, with an expectation of success.  Doing so would further provide the benefit of improving safety by improving the automated driving systems to handle the entire task of driving without the need for user intervention (see at least Ng, [0003]).
As per claim 2, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 1, as shown above.  Seff further discloses the following limitation:
wherein the set of variables includes at least one of a steering trajectory, an acceleration trajectory, or a state of the ego AV (see at least Seff, [0051] disclosing that the trajectory prediction neural network 202 can process scene context data 106 to generate predicted future trajectories for each of one or more target agents of the plurality of agents in the environment. A predicted future trajectory for a given target agent specifies predicted states for the target agent (e.g., locations, headings, velocities, accelerations, etc., for the target agent) <interpreted as the state of the ego AV> at one or more future time points).
As per claim 3, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 1, as shown above.  Seff further discloses the following limitation: 
wherein a state of the ego AV includes at least one of location coordinates of the ego AV, a heading angle of the ego AV, or a speed of the ego AV (see at least Seff, [0051] disclosing that the trajectory prediction neural network 202 can process scene context data 106 to generate predicted future trajectories for each of one or more target agents of the plurality of agents in the environment. A predicted future trajectory for a given target agent specifies predicted states for the target agent (e.g., locations, headings, velocities, accelerations, etc., for the target agent) <interpreted as the state of the ego AV, that includes velocities (i.e. speed and heading)> at one or more future time points).
As per claim 4, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 1, as shown above.  Refaat further discloses the following limitation: 
wherein the circuitry is further configured to: compute the set of gradients of the objective function (see at least Refaat, [0061] disclosing that the training engine 240 computes respective gradients for the first and second loss functions and generates updated parameter values 238 by using an appropriate machine learning training technique (e.g., stochastic gradient descent). Specifically, the training engine 240 generates updated parameter values 238 for all three neural networks 252-256. The training engine 240 can then update the collection of neural network parameters 230 using the updated parameter values 238),
wherein a first gradient of the set of gradients is determined with respect to a first variable of the set of variables (see at least Refaat, [0089] disclosing that the system computes a gradient of the first loss (602) with respect to the parameters of the second sub neural network; [0090] disclosing that the system computes a gradient of the second loss (604) with respect to the parameters of the trajectory generation neural network; [0091] disclosing that The system backpropagates the computed gradient of the first loss through the second sub neural network into the first sub neural network (606) to determine the update to the parameter values of the first and second sub neural networks); and 
a first updated value of the set of updated values of the first variable is determined based on an initial value of the set of initial values of the first variable and the first gradient of the set of gradients determined with respect to the first variable (see at least Refaat, [0092] disclosing that the system backpropagates the computed gradient of the second loss through the trajectory generation neural network into the first sub neural network (608) to determine the update to the parameter values of the first sub neural network and the trajectory generation neural network.).
As per claim 5, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 1, as shown above.  Refaat further discloses the following limitations:
wherein the first prediction network is trained based on knowledge distillation using a set of predictions of a pre-trained second prediction network (see at least Refaat, [0054] disclosing that while the sub neural network A 252 and the sub neural network B 254 are both included in the training scoring neural network 250, the trajectory generation neural network 256 is separate from the network 250 and therefore is not included in the scoring neural network 212 to be deployed on-board the vehicle 102. In other words, once deployed onboard the vehicle 102, the scoring neural network 212 is not configured to generate a trajectory generation output that defines a predicted future trajectory for an agent; [0056] disclosing that the sub neural network B 254 is configured to process the intermediate representation 234 to generate a training confidence score 264 for each candidate future trajectory. In general, each training confidence score 264 represents a predicted likelihood that the agent will follow the corresponding candidate future trajectory), and 
the set of predictions includes a state of the ego AV over a predefined time interval and a state of each AV of the set of AVs interacting with the ego AV over the predefined time interval (see at least Refaat, [0036] disclosing that each candidate future trajectory defines a possible path in the environment along which the agent will travel within a certain period of time in the future, e.g., within the next 5 seconds after the current time point <interpreted as a predefined time interval>; [0075] disclosing that the system generates one or more candidate future trajectories (404) for an agent that is present in the environment. The system can do so by using the candidate trajectory generation subsystem, by deriving the candidate future trajectories from the environment data, or both. A candidate future trajectory defines a possible path along which the agent can travel within a certain period of time after a particular time point. Typically, when generating new training examples, the particular time point corresponds to a time point in the past; [0076] disclosing that The system generates a new training example (406). Specifically, the new training example 324 includes a training input that includes: (i) data characterizing a scene in an environment in a vicinity of the vehicle 102 that includes the agent and (ii) data representing one or more candidate future trajectories of the agent, and a ground truth output that at least defines a ground truth future trajectory along which the agent travels. In general, the system derives the ground truth future trajectory from environment data. That is, the ground truth future trajectory is defined by the actual trajectory followed by the agent after the particular time point in the past).
As per claim 6, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 5, as shown above.  Refaat further discloses the following limitations:
wherein each prediction of the set of predictions is generated for each future time step of a set of future time steps over the predefined time interval (see at least Refaat, [0074]  disclosing that some implementations, the system can repeatedly perform the following steps 404 and 406 for each of the agents that are present in the environment; [0075]; [0076] disclosing that the system generates a new training example (406). Specifically, the new training example 324 includes a training input that includes: (i) data characterizing a scene in an environment in a vicinity of the vehicle 102 that includes the agent and (ii) data representing one or more candidate future trajectories of the agent, and a ground truth output that at least defines a ground truth future trajectory along which the agent travels. In general, the system derives the ground truth future trajectory from environment data. That is, the ground truth future trajectory is defined by the actual trajectory followed by the agent after the particular time point in the past), and 
the generation of each prediction for each future time step is based on: a state of the ego AV at each past time step of a set of past time steps relative to the corresponding future time step, and a state of each AV of the set of AVs at each past time step of the set of past time steps relative to the corresponding future time step (see at least Refaat, [0074]; [0075]; [0076]).
As per claim 7, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 6, as shown above.  Seff and Refaat further discloses the following limitations:
retrieve a prediction of the set of predictions generated for a future time step of the set of future time steps (see at least Seff, [0051] disclosing that the trajectory prediction neural network 202 can process scene context data 106 to generate predicted future trajectories for each of one or more target agents of the plurality of agents in the environment. A predicted future trajectory for a given target agent specifies predicted states for the target agent (e.g., locations, headings, velocities, accelerations, etc., for the target agent) at one or more future time points; [0052]; [0053]); 
apply the first prediction network on a set of inputs based on the retrieval (see at least Seff, [0054] disclosing that the trajectory prediction neural network 202 can generate the sequences of discrete motion tokens 204 following the ordering of motion tokens for the sequences. In particular, the trajectory prediction neural network 202 can generate earlier discrete motion tokens (with respect to the ordering of motion tokens for the sequences 204) before generating later discrete motion tokens), 
wherein the set of inputs include: a state of the ego AV at each past time step of a set of past time steps relative to the future time step, a state of each AV of the set of AVs at each past time step of the set of past time steps relative to the future time step, and a state of the ego AV at a current time step relative to the future time step (see at least Refaat, [0074]; [0075]; [0076]); 
generate, based on the application of the first prediction network on the set of inputs, a first prediction indicative of a state of the ego AV for the future time step and a state of each AV of the set of AVs for the future time step (see at least Seff, [0043]; [0058] disclosing that each sequence of discrete motion tokens 204 can auto-regressively encode a corresponding joint future trajectory for the target agents. The trajectory prediction system 114 can determine the joint future trajectory represented by one of the sequence of discrete motion tokens 204 by auto-regressively decoding the motion tokens of the sequence. In particular, the trajectory prediction system can decode the motion tokens following the ordering of motion tokens for the sequence (e.g., by decoding motion tokens that are earlier in the sequence, with respect to the ordering for the sequence, before later motion tokens in the sequence). When the trajectory prediction system 114 decodes a given motion token, the system can determine the state of the agent for the motion token at the time point for the motion token based on previously decoded agent states or motion tokens); and 
determine an outcome of a loss function based on a difference between the retrieved prediction and the generated first prediction (see at least Refaat, [0058] disclosing that the training system 220 also includes a training engine 240 which computes a value of a first loss function that evaluates a measure of difference between the training confidence scores 264 and the ground truth confidence scores that are derived from the ground truth future trajectory. In some implementations, for each candidate future trajectory, the ground truth confidence score is equal to one if the candidate future trajectory matches the ground truth future trajectory and is equal to zero if the candidate future trajectory does not match the ground truth future trajectory; [0060]; [0061]),  
wherein the first prediction network is trained based on the outcome (see at least Refaat, [0061] disclosing that the training engine 240 computes respective gradients for the first and second loss functions and generates updated parameter values 238 by using an appropriate machine learning training technique (e.g., stochastic gradient descent). Specifically, the training engine 240 generates updated parameter values 238 for all three neural networks 252-256. The training engine 240 can then update the collection of neural network parameters 230 using the updated parameter values 238).
As per claim 8, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 1, as shown above.  Seff further discloses the following limitation:
wherein the first prediction network includes an encoder model and a decoder model, and each of the encoder model and the decoder model includes a set of recurrent neural network models (see at least Seff, [0080] disclosing that the trajectory prediction neural network 202 can include a scene encoder neural network 402 (shown in Fig. 4). The scene encoder neural network 402 can process scene context data 102 to produce corresponding scene encodings 404. The scene encodings 404 are numerical representations of the corresponding scene context data 102 that characterize contents of the scene content data 102; [0081] disclosing that the scene encoder neural network 402 can have any neural architecture appropriate for generating one or more encodings of the scene context data 102. As an example, if the scene context data 102 includes image data, the scene encoder neural network 402 can include component networks appropriate for processing and generating encodings of image data, e.g., convolutional neural networks (CNNs), visual Transformers, etc. As another example, if the scene context data 102 includes time-series data (e.g., data collected across multiple time points), the scene encoder neural network 402 can include component networks appropriate for processing and generating encodings of time-series data, e.g., recurrent neural networks (RNNs), Transformers, etc.; [0082]; [0087] disclosing that the trajectory prediction neural network 202 can include a trajectory decoder neural network 406. The trajectory decoder neural network 406 can process the scene encodings 404 to produce the sequence of discrete motion tokens 204. In some implementations, the trajectory decoder neural network 406 can produce the sequence 204 by selecting discrete motion tokens from the token vocabulary 408; [0091] disclosing that the trajectory decoder neural network 406 can have any neural architecture appropriate for processing the scene encodings 404 and generating the sequences of discrete motion tokens 204. As an example, trajectory decoder neural network 406 can be a recurrent model, e.g., an RNN, an LSTM, etc. As another example, the trajectory decoder neural network 406 can have a Transformer architecture).
As per claim 9, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 8, as shown above.  Refaat further discloses the following limitations:
wherein the circuitry is further configured to: apply the encoder model on the updated value of the set of updated values, the state of the ego AV over the past time interval, and the state of each AV of the set of AVs over the past time interval (see at least Refaat, [0036] disclosing that the candidate trajectory generation subsystem 120 implements software that is configured to receive the environment data 112, data derived from the environment data 112, or both and repeatedly (i.e., at each of multiple time points) generate candidate trajectory data 122 that includes one or more candidate future trajectories for each of some or all of the multiple agents in the vicinity of the vehicle 102. Each candidate future trajectory defines a possible path in the environment along which the agent will travel within a certain period of time in the future, e.g., within the next 5 seconds after the current time point; [0061] disclosing that the training engine 240 computes respective gradients for the first and second loss functions and generates updated parameter values 238 by using an appropriate machine learning training technique (e.g., stochastic gradient descent). Specifically, the training engine 240 generates updated parameter values 238 for all three neural networks 252-256. The training engine 240 can then update the collection of neural network parameters 230 using the updated parameter values 238),
wherein the state of the ego AV over the past time interval corresponds to a state of the ego AV at each past time step of a set of past time steps over the past time interval, and the state of each AV of the set of AVs over the past time interval corresponds to a state of the corresponding AV at each past time step of the set of past time steps over the past time interval (see at least Refaat, [0036]; [0051]).
As per claim 10, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 8, as shown above.  Refaat further discloses the following limitations:
wherein the determined output corresponds to an output of the decoder model, the state of the ego AV over the future time interval includes a state of the ego AV at each future time step of a set of future time steps over the future time interval (see at least Refaat, [0074]; [0075]; [0076]), and 
the state of each AV of the set of AVs over the future time interval corresponds to a state of the corresponding AV at each future time step of the set of future time steps over the future time interval (see at least Refaat, [0074]; [0075]; [0076]).
As per claim 11, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 10, as shown above.  Seff further disclose the following limitations:
wherein the circuitry is further configured to: apply the decoder model on a state of the ego AV at a first future time step of the set of future time steps and a state of each AV of the set of AVs at the first future time step of the set of future time steps (see at least Seff, [0058] disclosing that  each sequence of discrete motion tokens 204 can auto-regressively encode a corresponding joint future trajectory for the target agents <interpreted as AV of the set of AVs>. The trajectory prediction system 114 can determine the joint future trajectory represented by one of the sequence of discrete motion tokens 204 by auto-regressively decoding the motion tokens of the sequence. In particular, the trajectory prediction system can decode the motion tokens following the ordering of motion tokens for the sequence (e.g., by decoding motion tokens that are earlier in the sequence, with respect to the ordering for the sequence, before later motion tokens in the sequence). When the trajectory prediction system 114 decodes a given motion token, the system can determine the state of the agent for the motion token at the time point for the motion token based on previously decoded agent states or motion tokens. For example, when the motion tokens correspond to deltas in target agent locations, the system can decode a particular motion token for a target agent and a time point to first determine a location delta for the target agent at the time point and then determine the location of the target agent at the time point by combining the location delta for the motion token with a location for the target agent determined by decoding previous motion tokens for the target agent); and 
generate, as an output of the decoder model, a state of the ego AV at a second future time step of the set of future time steps and a state of each AV of the set of AVs at the second future time step of the set of future time steps (see at least Seff, [0058];  [0065] disclosing that the system obtains scene context data characterizing a scene in an environment at a current time point (step 302). For example, the scene context data can characterize an area of the environment within a vicinity around an autonomous vehicle. As a further example, the scene context data can include data generated from data captured by one or more sensors of the autonomous vehicle. The context data may characterize any of a variety of observations regarding the environment, e.g., LIDAR data, RADAR data, images from camera sensors, etc.; [0066] disclosing that the system then generates a sequence of discrete motion tokens that defines a joint future trajectory for the plurality of agents (step 304). The system generates the sequence of discrete motion tokens using a trajectory prediction neural network as conditioned on the scene context data. An example process of generating the sequence of discrete motion tokens is described in more detail below with reference to FIG. 5 ; [0068] disclosing that when the system generates multiple sequences of discrete motion tokens, the system can aggregate the joint future trajectories defined by the generated sequences of discrete motion tokens (step 306). As part of aggregating the joint future trajectories, the system can generate (i) multiple predicted trajectory modes and (ii) a respective probability for each of the predicted trajectory modes).
As per claim 12, similar to claim 1, [a] method, comprising: in an electronic device (see at least Seff, abstract):
determining a set of updated values of a set of variables of an objective function for trajectory planning of an ego autonomous vehicle (AV) (see at least Seff, [0043]), ... ;
 applying a first prediction network on an updated value of the set of updated values, a state of the ego AV over a past time interval, and a state of each AV of a set of AVs over the past time interval (see at least Seff, [0043]; [0051]; [0052]; [0061]),
wherein the updated value is indicative of a current state of the ego AV (see at least Seff, abstract; [0043]; [0051]; [0052]); ... (2) ... ; 
determining, based on the application, an output that includes a state of the ego AV over a future time interval, and a state of each AV of the set of AVs over the future time interval (see at least Seff, [0074]);
determining a set of optimal values for the set of variables based on the updated value and the determined output satisfying a safety constraint associated with the objective function (see at least Seff, [0037); and
control a trajectory of the ego AV based on the set of optimal values of the set of variables (see at least Seff, [0074]).  But, Seff does not explicitly teach the following limitation taught in Refaat:
(1) wherein the determination is based on a set of initial values of the set of variables and a set of gradients of the objective function (see at least Refaat, [0005]; [0081]).  But, neither, Seff nor Refaat explicitly teach the following limitation taught in Ng:
(2) wherein the output is determined in a single inference at a current time instance and during an elapse of the future time interval, based on the application of the first prediction network on the updated value (see at least Ng, [0058]; [0066]; [0092]; [0104]; [0125]; [0126]; [0156]) ... .
Seff, Rafaat and Ng are analogous art to claim 12 because they are in the same field of autonomous vehicle trajectory planning.  Seff relates to predicting the future trajectory of an agent in an environment (see at least Seff, [0002]).  Refaat relates to methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network having a plurality of sub neural networks to assign respective confidence scores to one or more candidate future trajectories for an agent (see Refaat, Abstract). Ng relates to an ego vehicle applying a machine learning model to one or more agent tensors and one or more map tensors associated with a target vehicle to obtain a predicted drive intention of the target vehicle at a roadway intersection (see at least Ng, Abstract).
Therefore, it would have been prima facie obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the device, as disclosed in Seff, to provide the benefit of having (1) the determination be based on a set of initial values of the set of variables and a set of gradients of the objective function, as disclosed in Refaat, with a reasonable expectation of success. Doing so would provide the benefit of improving prediction accuracy and training efficacy (see at least Refaat, [0017]), with an expectation of success.  It would have been further obvious to one of someone of ordinary skill in the art before the effective filing date of the claim invention to have modified the device, as disclosed in Seff, as modified by Refaat, to provide the benefit of (2) having the output be determined in a single inference at a current time instance and during an elapse of the future time interval, based on the application of the first prediction network on the updated value, as disclosed in Ng, with an expectation of success.  Doing so would further provide the benefit of improving safety by improving the automated driving systems to handle the entire task of driving without the need for user intervention (see at least Ng, [0003]).
As per claim 13, similar to claim 2, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 12, as shown above.  Seff further discloses the following limitation:
wherein the set of variables includes at least one of a steering trajectory, an acceleration trajectory, or a state of the ego AV (see at least Seff, [0051]).
As per claim 14, similar to claim 5, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 12, as shown above.  Refaat further discloses the following limitations:
wherein the first prediction network is trained based on knowledge distillation using a set of predictions of a pre-trained second prediction network (see at least Refaat, [0054]; [0056]), and 
the set of predictions includes a state of the ego AV over a predefined time interval and a state of each AV of the set of AVs interacting with the ego AV over the predefined time interval (see at least Refaat, [0036]; [0075]; [0076]).
As per claim 15, similar to claim 6, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 14, as shown above.  Refaat further discloses the following limitations:
wherein each prediction of the set of predictions is generated for each future time step of a set of future time steps over the predefined time interval (see at least Refaat, [0074]; [0075]; [0076]), and 
the generation of each prediction for each future time step is based on: a state of the ego AV at each past time step of a set of past time steps relative to the corresponding future time step, and a state of each AV of the set of AVs at each past time step of the set of past time steps relative to the corresponding future time step (see at least Refaat, [0074]; [0075]; [0076]).
As per claim 16, similar to claim 8, the combination of Seff, Rafaat and Ng  discloses all of the limitations of claim 12, as shown above.  Seff further discloses the following limitation:
wherein the first prediction network includes an encoder model and a decoder model, and each of the encoder model and the decoder model includes a set of recurrent neural network models (see at least Seff, [0080]; [0081] ; [0082]; [0087]; [0091]).
As per claim 17, similar to claim 9, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 16, as shown above.  Refaat further discloses the following limitations:
applying the encoder model on the updated value of the set of updated values, the state of the ego AV over the past time interval, and the state of each AV of the set of AVs over the past time interval (see at least Refaat, [0036]; [0061]),
wherein the state of the ego AV over the past time interval corresponds to a state of the ego AV at each past time step of a set of past time steps over the past time interval, and the state of each AV of the set of AVs over the past time interval corresponds to a state of the corresponding AV at each past time step of the set of past time steps over the past time interval (see at least Refaat, [0036]; [0051]).
As per claim 18, similar to claim 10, the combination of Seff, Rafaat and Ng  discloses all of the limitations of claim 16, as shown above.  Refaat further discloses the following limitations:
wherein the determined output corresponds to an output of the decoder model, the state of the ego AV over the future time interval includes a state of the ego AV at each future time step of a set of future time steps over the future time interval (see at least Refaat, [0074]; [0075]; [0076]), and 
the state of each AV of the set of AVs over the future time interval corresponds to a state of the corresponding AV at each future time step of the set of future time steps over the future time interval (see at least Refaat, [0074]; [0075]; [0076]).
As per claim 19, similar to claim 11, the combination of Seff, Rafaat and Ng discloses all of the limitations of claim 18, as shown above.  Seff further disclose the following limitations:
wherein the circuitry is further configured to: apply the decoder model on a state of the ego AV at a first future time step of the set of future time steps and a state of each AV of the set of AVs at the first future time step of the set of future time steps (see at least Seff, [0058]); and 
generate, as an output of the decoder model, a state of the ego AV at a second future time step of the set of future time steps and a state of each AV of the set of AVs at the second future time step of the set of future time steps (see at least Seff, [0058];  [0065]; [0066]; [0068]).
As per claim 20, similar to claims 1 and 12, Seff discloses [a] non-transitory computer-readable medium having stored thereon, computer-executable instructions that when executed by an electronic device, causes the electronic device to execute operations (see at least Seff, [0112] disclosing that embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them), the operations comprising:
determining a set of updated values of a set of variables of an objective function for trajectory planning of an ego autonomous vehicle (AV) (see at least Seff, [0043]), ... (1) ... ;
 applying a first prediction network on an updated value of the set of updated values, a state of the ego AV over a past time interval, and a state of each AV of a set of AVs over the past time interval (see at least Seff, [0043]; [0051]; [0052]; [0061]),
wherein the updated value is indicative of a current state of the ego AV (see at least Seff, abstract; [0043]; [0051]; [0052]); ... (2) ... ;
determining, based on the application, an output that includes a state of the ego AV over a future time interval, and a state of each AV of the set of AVs over the future time interval (see at least Seff, [0074]);
determining a set of optimal values for the set of variables based on the updated value and the determined output satisfying a safety constraint associated with the objective function (see at least Seff, [0037); and
control a trajectory of the ego AV based on the set of optimal values of the set of variables (see at least Seff, [0074]).  But, Seff does not explicitly teach the following limitation taught in Refaat:
(1) wherein the determination is based on a set of initial values of the set of variables and a set of gradients of the objective function (see at least Refaat, [0005]; [0081]) ... .  But, neither, Seff nor Refaat explicitly teach the following limitation taught in Ng:
(2) wherein the output is determined in a single inference at a current time instance and during an elapse of the future time interval, based on the application of the first prediction network on the updated value (see at least Ng, [0058]; [0066]; [0092]; [0104]; [0125]; [0126]; [0156]) ... .
Seff, Rafaat and Ng are analogous art to claim 20 because they are in the same field of autonomous vehicle trajectory planning.  Seff relates to predicting the future trajectory of an agent in an environment (see at least Seff, [0002]).  Refaat relates to methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network having a plurality of sub neural networks to assign respective confidence scores to one or more candidate future trajectories for an agent (see Refaat, Abstract). Ng relates to an ego vehicle applying a machine learning model to one or more agent tensors and one or more map tensors associated with a target vehicle to obtain a predicted drive intention of the target vehicle at a roadway intersection (see at least Ng, Abstract).
Therefore, it would have been prima facie obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the device, as disclosed in Seff, to provide the benefit of having (1) the determination be based on a set of initial values of the set of variables and a set of gradients of the objective function, as disclosed in Refaat, with a reasonable expectation of success. Doing so would provide the benefit of improving prediction accuracy and training efficacy (see at least Refaat, [0017]), with an expectation of success.  It would have been further obvious to one of someone of ordinary skill in the art before the effective filing date of the claim invention to have modified the device, as disclosed in Seff, as modified by Refaat, to provide the benefit of (2) having the output be determined in a single inference at a current time instance and during an elapse of the future time interval, based on the application of the first prediction network on the updated value, as disclosed in Ng, with an expectation of success.  Doing so would further provide the benefit of improving safety by improving the automated driving systems to handle the entire task of driving without the need for user intervention (see at least Ng, [0003]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure, pertinent to the limitations of the independent claims. 
See, U.S. Patent Publication Number 2023/0159047 to Jiang et al. (hereafter Jiang): see [0023] disclosing that the learning-based critic includes an encoder and a similarity network, and each of the encoder and the similarity network is a neural network model. Each of the encoder and the similarity network is one of a recurrent neural network (RNN) or multi-layer perceptron (MLP) network. In one embodiment, the encoder is a RNN network, with each RNN cell being a gated recurrent unit (GRU); [0028] disclosing that the expert trajectories 111, also referred as demonstration trajectories, can be contained in a record file recorded by the ADV while it is being manually driven. Each expert trajectory can include points that the ADV is expected to pass, and several driving parameters of the ADV, such as heading, speed, jerks, and acceleration of the ADV at each point; [0031] disclosing that the learning-based critic acts as the objective function that describes the costs of various parameters of a motion planner. Thus, by optimizing the learning-based critic, the automatic tuning framework can identify a set of optimal parameters to optimize the parameters of the motion planner; [0034] disclosing that the physical and safety constraints in the rule-based motion planner are retained, which maintains reliability; [0070] disclosing that he encoder-decoder model used to train the encoder 501 above is a gated recurrent unit (GRU)-Encoder-Decoder (GRU-ED) model. Both the encoder 501 and the decoder 506 can be a recurrent neural network; and [0081] disclosing that Referring to FIG. 8, in operation 801, the processing logic building an objective function from a learning-based critic. In operation 803, the processing logic applies an optimization operation to optimize the objective function to determine a set of optimal parameters for a motion planner of a dynamic model of an autonomous driving vehicle (ADV) for one or more driving environments.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PATRICK M. BRADY III whose telephone number is (571)272-7458. The examiner can normally be reached Monday - Friday 7:00 am - 4;30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Erin Bishop can be reached at 571-270-3713. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

PATRICK M. BRADY III
Examiner
Art Unit 3665



/PATRICK M BRADY/         Examiner, Art Unit 3665                                                                                                                                                                                               
/Erin D Bishop/         Supervisory Patent Examiner, Art Unit 3665
Read full office action
Prosecution Timeline

Dec 06, 2023
Application Filed
Oct 22, 2025
Non-Final Rejection mailed — §103
Jan 22, 2026
Response Filed
May 12, 2026
Final Rejection mailed — §103
May 22, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/114,170
Patent 12616087
SYSTEM FOR DEPREDATOR AND PREDATOR CONTROL USING A ROBOT AND SENSORY CONTROLLING APPARATUS
5y 5m to grant Granted May 05, 2026
17/833,166
Patent 12594992
VEHICLE STEERING CONTROL DEVICE
3y 10m to grant Granted Apr 07, 2026
17/959,778
Patent 12591236
REMOTE SUPPORT SYSTEM AND REMOTE SUPPORT METHOD
3y 5m to grant Granted Mar 31, 2026
17/986,368
Patent 12589734
METHOD FOR DEALING WITH OBSTACLES IN AN INDUSTRIAL TRUCK
3y 4m to grant Granted Mar 31, 2026
18/394,039
Patent 12583517
VEHICLE STEERING CONTROL DEVICE
2y 3m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
55%
Grant Probability
96%
With Interview (+41.0%)
3y 0m (~6m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 121 resolved cases by this examiner. Grant probability derived from career allowance rate.