Office Action Analysis: 18089832 — METHOD OF TRAINING MODEL, METHOD OF PREDICTING TRAJECTORY, AND ELECTRONIC DEVICE

Office Action

§101 §102 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
	Claims 1-20 are presented for examination in this application, 18/089,832 filed 12/28/2022, having an effective filing date of 03/11/2022, via claimed priority to Chinese Patent Application no. 202210244263.1.
	The Examiner cites particular sections in the references as applied to the claims
below for the convenience of the applicant(s). Although the specified citations are
representative of the teachings in the art and are applied to the specific limitations within
the individual claim, other passages and figures may apply as well. It is respectfully
requested that, in preparing responses, the applicant(s) fully consider the references in
their entirety as potentially teaching all or part of the claimed invention, as well as the
context of the passage as taught by the prior art or disclosed by the Examiner.

Drawings
	The drawings submitted on 12/28/2022 have been fully considered and accepted.

Information Disclosure Statement
Acknowledgement is made of the information disclosure statements filed 12/28/2022 and 09/05/2023. All patents and non-patent literature have been considered.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

	Claims 1-20 are rejected under 35 U.S.C 101 as being unpatentable because the claimed invention in these claims is directed to an abstract idea without significantly more.

Regarding claim 1: 
	Step 1 – Is the claim directed to a process, machine, manufacture, or a composition of matter?
	Yes, the claim is directed to a method. 
	Step 2 – Prong 1 – Does the claim recite an abstract idea, law of nature, or a natural phenomenon?
	Yes, the claim recites an abstract idea: 
determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
performing at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, wherein the at least trajectory prediction action is based on training sample data — this limitation is directed to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).
Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No, the claim recites additional elements that do not integrate the judicial exception into a practical application: 
adjusting a model parameter of a of a to-be-trained model for an nth round according to a first action selection strategy, so as to obtain an intermediate network model, wherein n=1, ... N, and N is an integer greater than 1 — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
by using the intermediate network model — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
adjusting the model parameter of the to-be-trained model for an (n+1)thround according to the second action selection strategy — this limitation is directed to mere instructions to apply an exception, as the use of a computer or other machinery in its ordinary capacity amounts to invoking computer components merely as a tool to perform an existing process (see MPEP 2106.05(f)(2)).
Step 2B – Does the claim recite additional elements that amount to significantly more than the abstract idea itself?
	No, there are no additional elements that amount to significantly more than the judicial exception.

Regarding claim 2: 
	Claim 2 recites determining features from the environment which merely amounts to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).

Regarding claim 12: 
	Claim 12 recites analogous limitations to claim 2 and therefore is rejected on the same grounds.

Regarding claim 3: 
Claim 2 recites determining features from the environment which merely amounts to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).

Regarding claim 13: 
	Claim 13 recites analogous limitations to claim 3 and therefore is rejected on the same grounds.

Regarding claim 4:
Claim 4 recites determining features from the environment and making predictions which merely amounts to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). In addition, Claim 4 recites performing weighting according to an attention matrix which amounts to a machine learning process at high level of generality, which amounts to mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)).

Regarding claim 14: 
	Claim 14 recites analogous limitations to claim 4 and therefore is rejected on the same grounds.

Regarding claim 5:
	Claim 5 recites determining features from the environment and making predictions which merely amounts to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.). In addition, Claim 4 recites using an intermediate network which amounts to a machine learning process at high level of generality, which amounts to mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)).

Regarding claim 15: 
	Claim 15 recites analogous limitations to claim 5 and therefore is rejected on the same grounds.

Regarding claim 6: 
Claim 6 recites a machine learning process recited at a high level of generality,
which amounts to mere instructions to apply the judicial exception on a computer (see
MPEP 2106.05(f)).

Regarding claim 16: 
	Claim 16 recites analogous limitations to claim 6 and therefore is rejected on the same grounds.

Regarding claim 7: 
Claim 7 recites determining reward function values and determining strategies based on those reward functions which merely amounts to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).

Regarding claim 17: 
	Claim 17 recites analogous limitations to claim 7 and therefore is rejected on the same grounds.

Regarding claim 8: 
Claim 8 recites a machine learning process recited at a high level of generality,
which amounts to mere instructions to apply the judicial exception on a computer (see
MPEP 2106.05(f)).

Regarding claim 18: 
	Claim 18 recites analogous limitations to claim 8 and therefore is rejected on the same grounds.

Regarding claim 9: 
Claim 9 recites selecting action strategies based on values of the reward function which merely amounts to the abstract idea of a mental process (including an observation, evaluation, judgement, opinion) which can be performed by the human mind or by a human using pen and paper (see MPEP 2106.04(a)(2) III. C.).

Regarding claim 19: 
	Claim 19 recites analogous limitations to claim 9 and therefore is rejected on the same grounds.

Regarding claim 20: 
Claim 20 recites hardware components recited at a high level of generality, which amounts to mere instructions to apply the judicial exception on a computer (see
MPEP 2106.05(f)).

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 10-11 and 20 are rejected under 35 U.S.C 102(a)(2) as being unpatentable over Guez et al. (US-20220366245-A1, hereinafter Guez).
Regarding claim 1:
	Guez teaches a method of training a model, comprising: adjusting a model parameter of a to-be-trained model for an nth round according to a first action selection strategy, so as to obtain an intermediate network model, wherein n=1, ... N, and N is an integer greater than 1 (see para [0084]: “In an actor-critic approach an estimated state value acts as a critic which is used when adjusting parameters of the action selection neural network system during training.”. Also see para [0006]: “[0006]: “In one aspect there is described a method of reinforcement learning. The method may include training an action selection neural network system to select actions to be performed by an agent in an environment for performing a task. The action selection neural network system may be configured to receive data from an observation characterizing a current state of the environment. The action selection neural network system may also be configured to receive data from an output of a model neural network. The action selection neural network system may process the input data in accordance with action selection neural network system parameters to generate an action selection output for selecting the actions to be performed by the agent.”); 
	performing, by using the intermediate network model, at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, wherein the at least one trajectory prediction action is based on training sample data (see para [0007]: “The model neural network may be configured to receive an input (derived) from the observation characterizing the current state of the environment. The output of the model neural network may characterize a predicted state trajectory comprising a series of k predicted future states of the environment starting from the current state. Also see [0101]: “In a Q-learning reinforcement learning system the value function loss 

    PNG
    media_image1.png
    42
    25
    media_image1.png
    Greyscale

 v, may be any measure of a difference, e.g. a squared difference, between the state-action value estimate for a particular action, qt m, and the reinforcement learning training goal may be a state-action value for the particular action determined from a return for the time step t. The particular action may be an action (or one of a trajectory of actions) sampled from a memory storing a sequence of experience tuples, each corresponding to a respective time step.”); 
determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy (see para [0015]: “More specifically training the hindsight model may comprise processing, using a hindsight value neural network, the output of the hindsight model neural network and the observation characterizing the state of the environment at the time step t, to generate an estimated hindsight value or state-action value for the state of the environment at the time step t.”); 
and adjusting the model parameter of the to-be-trained model for an (n+1)thround according to the second action selection strategy (see para [0008]: “The method may comprise training a hindsight model neural network having an output characterizing a state trajectory comprising a series of k states of the environment starting from a state of the environment at a time step t. The training may comprise processing data from one or more e.g. a sequence of observations, online or off-line, characterizing the state of the environment at the time step t and at the series of k subsequent time steps, and adjusting parameters of the hindsight model neural network using a training goal for the time step t. The method may further comprise training the output of the model neural network to approximate the output of the hindsight model neural network.”).

Regarding claim 10:
	Guez teaches a method of predicting a trajectory, comprising: acquiring source data to be processed; and performing at least one trajectory prediction action based on the source data by using a trajectory prediction model, so as to obtain a trajectory prediction result, wherein the trajectory prediction model is generated by: 
	adjusting a model parameter of a to-be-trained model for an nth round according to a first action selection strategy, so as to obtain an intermediate network model, wherein n=1, ... N, and N is an integer greater than 1 (see para [0084]: “In an actor-critic approach an estimated state value acts as a critic which is used when adjusting parameters of the action selection neural network system during training.”. Also see para [0006]: “In one aspect there is described a method of reinforcement learning. The method may include training an action selection neural network system to select actions to be performed by an agent in an environment for performing a task. The action selection neural network system may be configured to receive data from an observation characterizing a current state of the environment. The action selection neural network system may also be configured to receive data from an output of a model neural network. The action selection neural network system may process the input data in accordance with action selection neural network system parameters to generate an action selection output for selecting the actions to be performed by the agent.”);
	performing, by using the intermediate network model, at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, wherein the at least one trajectory prediction action is based on training sample data (see para [0007]: “The model neural network may be configured to receive an input (derived) from the observation characterizing the current state of the environment. The output of the model neural network may characterize a predicted state trajectory comprising a series of k predicted future states of the environment starting from the current state. Also see [0101]: “In a Q-learning reinforcement learning system the value function loss 

    PNG
    media_image1.png
    42
    25
    media_image1.png
    Greyscale

 v, may be any measure of a difference, e.g. a squared difference, between the state-action value estimate for a particular action, qt m, and the reinforcement learning training goal may be a state-action value for the particular action determined from a return for the time step t. The particular action may be an action (or one of a trajectory of actions) sampled from a memory storing a sequence of experience tuples, each corresponding to a respective time step.”); 
	determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy (see para [0015]: “More specifically training the hindsight model may comprise processing, using a hindsight value neural network, the output of the hindsight model neural network and the observation characterizing the state of the environment at the time step t, to generate an estimated hindsight value or state-action value for the state of the environment at the time step t.”);
and adjusting the model parameter of the to-be-trained model for an (n+1)thround according to the second action selection strategy (see para [0008]: “The method may comprise training a hindsight model neural network having an output characterizing a state trajectory comprising a series of k states of the environment starting from a state of the environment at a time step t. The training may comprise processing data from one or more e.g. a sequence of observations, online or off-line, characterizing the state of the environment at the time step t and at the series of k subsequent time steps, and adjusting parameters of the hindsight model neural network using a training goal for the time step t. The method may further comprise training the output of the model neural network to approximate the output of the hindsight model neural network.”).

Regarding claim 11:
Guez teaches an electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions, when executed by the at least one processor, are configured to cause the at least one processor to at least (see para [0123]: “Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.”): 
adjusting a model parameter of a to-be-trained model for an nth round according to a first action selection strategy, so as to obtain an intermediate network model, wherein n=1, ... N, and N is an integer greater than 1 (see para [0084]: “In an actor-critic approach an estimated state value acts as a critic which is used when adjusting parameters of the action selection neural network system during training.”. Also see para [0006]: “In one aspect there is described a method of reinforcement learning. The method may include training an action selection neural network system to select actions to be performed by an agent in an environment for performing a task. The action selection neural network system may be configured to receive data from an observation characterizing a current state of the environment. The action selection neural network system may also be configured to receive data from an output of a model neural network. The action selection neural network system may process the input data in accordance with action selection neural network system parameters to generate an action selection output for selecting the actions to be performed by the agent.”); 
	performing, by using the intermediate network model, at least one trajectory prediction action indicated by the first action selection strategy, so as to obtain a trajectory prediction result, wherein the at least one trajectory prediction action is based on training sample data (see para [0007]: “The model neural network may be configured to receive an input (derived) from the observation characterizing the current state of the environment. The output of the model neural network may characterize a predicted state trajectory comprising a series of k predicted future states of the environment starting from the current state. Also see para [0101]: “In a Q-learning reinforcement learning system the value function loss 

    PNG
    media_image1.png
    42
    25
    media_image1.png
    Greyscale

 v, may be any measure of a difference, e.g. a squared difference, between the state-action value estimate for a particular action, qt m, and the reinforcement learning training goal may be a state-action value for the particular action determined from a return for the time step t. The particular action may be an action (or one of a trajectory of actions) sampled from a memory storing a sequence of experience tuples, each corresponding to a respective time step.”); 
determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy (see para [0015]: “More specifically training the hindsight model may comprise processing, using a hindsight value neural network, the output of the hindsight model neural network and the observation characterizing the state of the environment at the time step t, to generate an estimated hindsight value or state-action value for the state of the environment at the time step t.”); 
and adjusting the model parameter of the to-be-trained model for an (n+1)thround according to the second action selection strategy (see para [0008]: “The method may comprise training a hindsight model neural network having an output characterizing a state trajectory comprising a series of k states of the environment starting from a state of the environment at a time step t. The training may comprise processing data from one or more e.g. a sequence of observations, online or off-line, characterizing the state of the environment at the time step t and at the series of k subsequent time steps, and adjusting parameters of the hindsight model neural network using a training goal for the time step t. The method may further comprise training the output of the model neural network to approximate the output of the hindsight model neural network.”).

Regarding claim 20:
	Guez teaches an electronic device, comprising: at least one processor (see para [0119]: “The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers”); and 
a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions, when executed by the at least one processor, are configured to cause the at least one processor to implement the method according to claim 10 (see para [0123]: “Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. ”).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2-5, 7, 12-15, and 17 are rejected under 35 U.S.C 103 as being unpatentable over Guez et al. (US-20220366245-A1, hereinafter Guez) in view of Liu et al. (US-20230085296-A1, hereinafter Liu).
Regarding claim 2:
	Guez teaches the method of claim 1.	
Guez further teaches wherein the at least one trajectory prediction action comprises at least one selected from: for a target obstacle in at least one obstacle, determining a temporal interaction feature for the target obstacle (see [0059]: “In general the observations may include, for example, one or more of images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator”);
…
determining an environmental interaction feature between the target obstacle and a traveling environment, wherein the target obstacle comprises any of the at least one obstacle (see [0057]: “As another example, the agent may be a robot or other mechanical agent interacting with the environment to achieve a specific task, e.g., to locate an object of interest in the environment or to pick up or move an object of interest to a specified location in the environment.”. Also see [0060]: “The rewards, that is the external rewards from the environment, may include e.g. one or more rewards for approaching or achieving one or more target locations, one or more target poses, or one or more other target configurations. For example for a robot a reward may depend on a joint orientation (angle) or velocity, an end-effector position, a center-of-mass position, or the positions and/or orientations of groups of body parts. Costs (i.e. negative rewards) may be similarly defined e.g. dependent upon applied force when interacting with an object, energy usage, or positions of robot body parts.”)
Guez does not explicitly determining a spatial interaction feature between the target obstacle and another obstacle and the another obstacle comprises an obstacle in the at least one obstacle other than the target obstacle. 
Liu, however, analogously teaches determining a spatial interaction feature between the target obstacle and another obstacle (see [0052]: “At 910, the prediction system 170 acquires characteristics, prior trajectories, and spatiotemporal (e.g., motion, path, etc.) interactions of multiple vehicles. As previously explained, characteristics can be operator behavior, a historical trajectory from a prior time-step, road geometries, and operator preferences (e.g., aggressiveness) for vehicles in an area.”. Also see para [0053]: “At 920, the prediction system 170 computes a graph having a geographic map and vehicle features of multiple vehicles simultaneously. In particular, the prediction system 170 separately processes geographic map and vehicle data from multiple vehicles to extract related features, thereby improving trajectory estimates for complex road geometries. As previously explained, trajectory predictions are also improved by using the encoded features of neighboring vehicles that capture spatiotemporal interactions.”.)
	and the another obstacle comprises an obstacle in the at least one obstacle other than the target obstacle (see [0064]: “In one or more arrangements, the map data 116 can include one or more static obstacle maps 118. The static obstacle map(s) 118 can include information about one or more static obstacles located within one or more geographic areas. A “static obstacle” is a physical object whose position does not change or substantially change over a period of time and/or whose size does not change or substantially change over a period of time. Examples of static obstacles can include trees, buildings, curbs, fences, railings, medians, utility poles, statues, monuments, signs, benches, furniture, mailboxes, large rocks, or hills. The static obstacles can be objects that extend above ground level. The one or more static obstacles included in the static obstacle map(s) 118 can have location data, size data, dimension data, material data, and/or other data associated with it. The static obstacle map(s) 118 can include measurements, dimensions, distances, and/or information for one or more static obstacles. The static obstacle map(s) 118 can be high quality and/or highly detailed. The static obstacle map(s) 118 can be updated to reflect changes within a mapped area.”).
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Guez and Liu before him or her, to modify the method of claim 2 to include attributes of determining a spatial interaction feature between the target obstacle and another obstacle and the another obstacle comprises an obstacle in the at least one obstacle other than the target obstacle in order to allow automated driving module(s) to determine the location of obstacles (see Liu at para [0082]: “The automated driving module(s) 160 can determine the location of obstacles, obstacles, or other environmental features including traffic signs, trees, shrubs, neighboring vehicles, pedestrians, etc.”).

Regarding claim 12:
Claim 12 recites analogous limitations to claim 2 and therefore is rejected on the same grounds.

Regarding claim 3:
	Guez in view of Liu teaches the method of claim 2.
	Guez further teaches wherein performing, by using the intermediate network model, the at least one trajectory prediction action based on the training sample data indicated by the first action selection strategy so as to obtain the trajectory prediction result comprises:
 	in response to the at least one trajectory prediction action comprising determining the temporal interaction feature, determining the temporal interaction feature for the target obstacle, according to a location information of the target obstacle indicated by the training sample data, wherein the location information of the target obstacle is based on at least one historical time instant (see para [0013]-[0014]: “For example the training goal for the time step t may comprise comprises a state value target for the time step t, or may be derived from one or more state-action value targets for the time step t. For example the training goal for the time step t may define an expected return from the state of the environment at a time step t, e.g. an expected cumulative reward to be received by the agent from the state of the environment at a time step t.”. Also see para [0075]: “Optionally, in any of the above implementations, the observation at any given time step may include data from a previous time step that may be beneficial in characterizing the environment, e.g., the action performed at the previous time step, the reward received at the previous time step, and so on.”)
; and 
performing an obstacle trajectory prediction based on the temporal interaction feature, so as to obtain the trajectory prediction result (see para [0022]: “Thus the method may further comprise maintaining a memory that stores data representing trajectories generated as a result of interaction of the agent with the environment, each trajectory comprising data at each of a series of time steps identifying at least an observation characterizing a state of the environment and a series of subsequent observations characterizing subsequent states of the environment for training the hindsight model neural network.”)

Regarding claim 13:
Claim 13 recites analogous limitations to claim 3 and therefore is rejected on the same grounds.

Regarding claim 4:
Guez in view of Liu teaches the method of claim 2.
Guez further teaches performing an obstacle trajectory prediction based on the spatial interaction feature, so as to obtain the trajectory prediction result (see para [0022]: “Thus the method may further comprise maintaining a memory that stores data representing trajectories generated as a result of interaction of the agent with the environment, each trajectory comprising data at each of a series of time steps identifying at least an observation characterizing a state of the environment and a series of subsequent observations characterizing subsequent states of the environment for training the hindsight model neural network.”).
Guez does not explicitly teach wherein performing, by using the intermediate network model, the at least one trajectory prediction action based on the training sample data indicated by the first action selection strategy so as to obtain the trajectory prediction result comprises: in response to the at least one trajectory prediction action comprising determining the spatial interaction feature, determining a spatial interaction sub feature based on each historical time instant between the target obstacle and the another obstacle, according to a location information of each obstacle indicated by the training sample data, wherein the location information of each obstacle is based on at least one historical time instant; performing weighting on the spatial interaction sub feature based on each historical time instant according to a preset first attention matrix, so as to obtain the spatial interaction feature.
Liu, however, analogously teaches wherein performing, by using the intermediate network model, the at least one trajectory prediction action based on the training sample data indicated by the first action selection strategy so as to obtain the trajectory prediction result comprises: in response to the at least one trajectory prediction action comprising determining the spatial interaction feature, determining a spatial interaction sub feature based on each historical time instant between the target obstacle and the another obstacle, according to a location information of each obstacle indicated by the training sample data, wherein the location information of each obstacle is based on at least one historical time instant (see para [0053]: “At 920, the prediction system 170 computes a graph having a geographic map and vehicle features of multiple vehicles simultaneously. In particular, the prediction system 170 separately processes geographic map and vehicle data from multiple vehicles to extract related features, thereby improving trajectory estimates for complex road geometries. As previously explained, trajectory predictions are also improved by using the encoded features of neighboring vehicles that capture spatiotemporal interactions. Here, the graphing is used to represent geographic map and vehicle information. In general, a graph is constructed by nodes and edges. A node contains a feature encoding of a vehicle or a feature encoding of a map portion. The connection between nodes is determined by edges according to relationships based on probabilities or correlations..”)
performing weighting on the spatial interaction sub feature based on each historical time instant according to a preset first attention matrix, so as to obtain the spatial interaction feature (see para [0021]: “ In particular, the prediction system learns and aggregates spatiotemporal interactions, information (e.g., position, speed, etc.), or long-term dependencies across vehicles through self-attention transformation that relates different positions of a node feature (e.g., intersection length, vehicle speed, etc.) within a sequence. In one approach, the transformation relates input vectors and input pairs of the learning model to an output that is a weighted sum of the input vectors and the input pairs.”);
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Guez and Liu before him or her, to modify the method of claim 4 to include attributes of determining a spatial interaction sub feature based on each historical time instant between the target obstacle and the another obstacle, according to a location information of each obstacle indicated by the training sample data, wherein the location information of each obstacle is based on at least one historical time instant, performing weighting on the spatial interaction sub feature based on each historical time instant according to a preset first attention matrix, so as to obtain the spatial interaction feature in order to use data from the obstacle map(s) (see Liu at para [0064]: “The static obstacles can be objects that extend above ground level. The one or more static obstacles included in the static obstacle map(s) 118 can have location data, size data, dimension data, material data, and/or other data associated with it. The static obstacle map(s) 118 can include measurements, dimensions, distances, and/or information for one or more static obstacles.”).

Regarding claim 14: 
Claim 14 recites analogous limitations to claim 4 and therefore is rejected on the same grounds.

Regarding claim 5:
Guez in view of Liu teaches the method of claim 2.
Guez further teaches wherein the training sample data indicates a location information of the target obstacle based on at least one historical time instant and a road information of the traveling environment, and wherein performing, by using the intermediate network model, the at least one trajectory prediction action based on the training sample data indicated by the first action selection strategy so as to obtain the trajectory prediction result comprises: in response to the at least one trajectory prediction action comprising determining the environmental interaction feature, 
determining at least one trajectory vector for the target obstacle, according to the location information of the target obstacle based on at least one historical time instant (see para [0010]: “More specifically the hindsight model may learn to represent particular aspects of the trajectory of states which are important for receiving a reward. Thus rather than learning to model all the detail in the observations the hindsight model has a low-dimensional feature vector representation output, and optionally a relatively short forward (i.e. hindsight) time window.”. Also see para [0059]: “In general the observations may include, for example, one or more of images, object position data, and sensor data to capture observations as the agent interacts with the environment, for example sensor data from an image, distance, or position sensor or from an actuator. In the case of a robot or other mechanical agent or vehicle the observations may similarly include one or more of the position, linear or angular velocity, force, torque or acceleration, and global or relative pose of one or more parts of the agent. ”); and 
performing an obstacle trajectory prediction based on the environmental interaction feature, so as to obtain the trajectory prediction result (see para [0039]: “The system may be configured to train a hindsight model neural network having an output characterizing a state trajectory comprising a series of k states of the environment starting from a state of the environment at a time step t, by processing observations characterizing the state of the environment at the time step t and at the series of k subsequent time steps and adjusting parameters of the hindsight model neural network using a training goal for the time step t. The system may be further configured to train the output of the model neural network to approximate the output of the hindsight model neural network.”. Also see para [0078]: “The reinforcement learning system 100 also comprises a model neural network 120 which receives an input derived from the observation ot characterizing the current state of the environment. The training engine 150 is configured to train a feature vector output, {circumflex over (ϕ)} of the model neural network 120 to approximate the output of the hindsight model neural network e.g. based on a loss function, such as a squared loss or cross-entropy loss, measuring a difference between the two outputs.”). 
Guez does not explicitly determining at least one road vector according to the road information of the traveling environment; determining the environment interaction feature associated with the target obstacle according to the at least one trajectory vector and the at least one road vector. 
Liu, however, analogously teaches determining at least one road vector according to the road information of the traveling environment (see para [0036]: “This is performed while balancing complexity and computation efficiency. In particular, the system 400 encodes inputs 410 representing initial vectors of each geographic map and vehicle information within an area separately.”. Also see para [0038]: “In one approach, the prediction system 170 also aggregates map information to capture road geometries and generate a single map feature for the scene that further improves trajectory prediction. As explained subsequently, updated geographic map features are concatenated (450) to form an input vector for each vehicle that is processed by decoder 470”); 
determining the environment interaction feature associated with the target obstacle according to the at least one trajectory vector and the at least one road vector (see para [0082]: “ In one or more arrangements, the automated driving module(s) 160 can use such data to generate one or more driving scene models. The automated driving module(s) 160 can determine position and velocity of the vehicle 100. The automated driving module(s) 160 can determine the location of obstacles, obstacles, or other environmental features including traffic signs, trees, shrubs, neighboring vehicles, pedestrians, etc.”). 
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Guez and Liu before him or her, to modify the method of claim 5 to include attributes of determining at least one road vector according to the road information of the traveling environment and determining the environment interaction feature associated with the target obstacle according to the at least one trajectory vector and the at least one road vector in order to improve trajectory estimates for complex road geometries (see Liu at para [0056]: “At 930, the prediction system 170 processes updates for the features separately including features of neighboring vehicles. As previously explained, the prediction system 170 separately processes geographic map and vehicle data from multiple vehicles to extract related features, thereby improving trajectory estimates for complex road geometries”).

Regarding claim 15:
Claim 15 recites analogous limitations to claim 5 and therefore is rejected on the same grounds.

Regarding claim 7:
	Guez teaches the method of claim 1.
	Guez further wherein the determining a second action selection strategy according to the trajectory prediction result and the first action selection strategy comprises: determining a reward function value for the first action selection strategy according to the trajectory prediction result and an obstacle trajectory label indicated by verification sample data (see para [0018]: “Training the output of the model neural network to approximate the output of the hindsight model neural network may comprise backpropagating gradients of an objective function dependent upon a difference between the (vector-valued) outputs i.e. between features of the state trajectory and features the predicted state trajectory. These features may represent aspects of the trajectories which are useful to predict the value of state-action value at a current time step t. The difference may comprise e.g. an L2 norm or a cross-entropy loss. The model neural network and the hindsight model neural network may be trained jointly or separately e.g. sequentially.”. Also see para [0023]: “For off-policy training of the action selection neural network system a replay buffer may store tuples, e.g. sequences of tuples, comprising: an observation characterizing a state of the environment, an action performed by the agent in response to the observation, a reward received in response to the agent performing the action, and an observation characterizing a next state of the environment.”.);
and determining the second action selection strategy according to the reward function value and the first action selection strategy (see [0086]: “FIG. 2 is a flow diagram of an example process for training the reinforcement learning system of FIG. 1. The process receives observations and rewards (step 200) and processes data derived from a current observation using the model neural network 120 (step 202). An output of the model neural network 120 and the is processed by the action selection neural network system 110 in conjunction with data derived from the current observation to select an action to be performed by the agent (step 204).”. Also see [0087]: “Data derived from one or more future observations, and characterizing a future trajectory of the state of the environment over k time steps, is processed by the hindsight model neural network 130 (step 206)”). 
 
Regarding claim 17:
Claim 17 recites analogous limitations to claim 7 and therefore is rejected on the same grounds.

Claims 6 and 16 are rejected under 35 U.S.C 103 as being unpatentable over Guez et al. (US-20220366245-A1, hereinafter Guez) in view of Liu et al. (US-20230085296-A1, hereinafter Liu) in further view of Cai et al. (“Environment-Attention Network for Vehicle Trajectory Prediction” hereinafter, Cai).
Regarding claim 6: 
	Guez in view of Liu teaches the method of 6. 
	Guez does not explicitly teach wherein the determining the environment interaction feature associated with the target obstacle according to the at least one trajectory vector and the at least one road vector comprises: connecting, for each of the at least one trajectory vector, the trajectory vector and a road vector meeting a preset distance threshold value condition with the trajectory vector, so as to generate an adjacency matrix; and performing an interaction information extraction based on the adjacency matrix, so as to obtain the environment interaction feature associated with the target obstacle.
	Cai, however, analogously teaches wherein the determining the environment interaction feature associated with the target obstacle according to the at least one trajectory vector and the at least one road vector comprises: connecting, for each of the at least one trajectory vector, the trajectory vector and a road vector meeting a preset distance threshold value condition with the trajectory vector, so as to generate an adjacency matrix (see section IV. A. pg. 11220-11221: “Fig. 3. shows the feature update process of each environment node according to the attention weight matrix in the graph attention layer. In order to avoid the inaccurate connection between the environment nodes defined by human subjectively, an adaptive adjacency matrix Af is constructed as the initial adjacency matrix input to the Graph Attention layer: Af =SoftMaxLeakyReLU M1MT 2 (11) whereM1,M2 ∈ RN× F are the two learnable parameter matrices.”) and 
	performing an interaction information extraction based on the adjacency matrix, so as to obtain the environment interaction feature associated with the target obstacle (see section IV. A. pg. 11221: “In the process of model training, the adjacency matrix Af establishes the edges between the nodes in the graph through continuous learning and updating. At the same time, Af and the attention weight matrix α can be modified after learning and updating, which enhances the accuracy of the expression of the strength of the edges between the nodes.”)
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Guez, Liu, and Cai before him or her, to modify the method of claim 6 to include attributes of wherein the determining the environment interaction feature associated with the target obstacle according to the at least one trajectory vector and the at least one road vector comprises: connecting, for each of the at least one trajectory vector, the trajectory vector and a road vector meeting a preset distance threshold value condition with the trajectory vector, so as to generate an adjacency matrix, performing an interaction information extraction based on the adjacency matrix, so as to obtain the environment interaction feature associated with the target obstacle in order to improve the efficiency and reliability of data collection (see section IV. A. pg. 11220-11221 : “. In order to avoid the inaccurate connection between the environment nodes defined by human subjectively, an adaptive adjacency matrix Af is constructed as the initial adjacency matrix input to the Graph Attention layer”.).

Regarding claim 16:
Claim 16 recites analogous limitations to claim 6 and therefore is rejected on the same grounds.

Claims 8, 9, 18, and 19 are rejected under 35 U.S.C 103 as being unpatentable over Guez et al. (US-20220366245-A1, hereinafter Guez) in view of Liu et al. (US-20230085296-A1, hereinafter Liu) in further view of Zhang et al. (US-12124282B2, hereinafter Zhang).
Regarding claim 8: 
Guez in view of Liu teaches the method of claim 7.
	Guez further teaches wherein the first action selection strategy comprises a control parameter for the at least one trajectory prediction action (see para [0008]: “The method may comprise training a hindsight model neural network having an output characterizing a state trajectory comprising a series of k states of the environment starting from a state of the environment at a time step t. The training may comprise processing data from one or more e.g. a sequence of observations, online or off-line, characterizing the state of the environment at the time step t and at the series of k subsequent time steps, and adjusting parameters of the hindsight model neural network using a training goal for the time step t”),
Guez does not explicitly teach wherein the determining a second action selection strategy according to the reward function value and the first action selection strategy comprises: adjusting, in response to the reward function value being greater than a preset reward threshold value, the control parameter in the first action selection strategy according to the reward function value, so as to obtain the second action selection strategy 
	Zhang, however, analogously teaches wherein the determining a second action selection strategy according to the reward function value and the first action selection strategy comprises: 
adjusting, in response to the reward function value being greater than a preset reward threshold value, the control parameter in the first action selection strategy according to the reward function value, so as to obtain the second action selection strategy (see col 3 lines 4-13: “
    PNG
    media_image2.png
    285
    824
    media_image2.png
    Greyscale
.”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Guez, Liu, and Zhang before him or her, to modify the method of claim 8 to include attributes of adjusting, in response to the reward function value being greater than a preset reward threshold value, the control parameter in the first action selection strategy according to the reward function value, so as to obtain the second action selection strategy in order to improve the efficiency and reliability of data collection (see col 1 lines 52-58: “In the method, intentions of a data collector and sensor nodes are expressed as rewards and penalties according to the real-time changing monitoring network environment, and a path of the data collector is planned through a Q-learning reinforcement learning method, so as to improve the efficiency and reliability of data collection.”)

Regarding claim 18:
Claim 18 recites analogous limitations to claim 8 and therefore is rejected on the same grounds.

Regarding claim 9:
Guez in view of Liu in further view of Zhang teaches the method of claim 8.
Guez does not explicitly teach determining a second action selection strategy according to the reward function value and the first action selection strategy comprises further comprises: randomly selecting an action selection strategy as the second action selection strategy, in response to the reward function value being less than or equal to the preset reward threshold value or an adjustment round for the model parameter of the to-be-trained model being less than a preset round threshold value.
Zhang, however, analogously teaches further wherein determining a second action selection strategy according to the reward function value and the first action selection strategy comprises further comprises: randomly selecting an action selection strategy as the second action selection strategy, in response to the reward function value being less than or equal to the preset reward threshold value or an adjustment round for the model parameter of the to-be-trained model being less than a preset round threshold value (see col 3 lines 4-13: “
    PNG
    media_image2.png
    285
    824
    media_image2.png
    Greyscale
.”).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Guez, Liu, and Zhang before him or her, to modify the method of claim 9 to include attributes of determining a second action selection strategy according to the reward function value and the first action selection strategy comprises further comprises: randomly selecting an action selection strategy as the second action selection strategy, in response to the reward function value being less than or equal to the preset reward threshold value or an adjustment round for the model parameter of the to-be-trained model being less than a preset round threshold value in order to improve the efficiency and reliability of data collection (see col 1 lines 52-58: “In the method, intentions of a data collector and sensor nodes are expressed as rewards and penalties according to the real-time changing monitoring network environment, and a path of the data collector is planned through a Q-learning reinforcement learning method, so as to improve the efficiency and reliability of data collection.”)

Regarding claim 19:
Claim 19 recites analogous limitations to claim 9 and therefore is rejected on the same grounds.

Pertinent Prior Art
	The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure: 
US12110041B2 — Choi et al. — predicting trajectories using image data of the environment 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Andrew A Bracero whose telephone number is (571)270-0592. The examiner can normally be reached Monday - Friday 9:00a.m. - 5:00 p.m. ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached Monday - Friday 9:00a.m. - 5:00 p.m. ET at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANDREW BRACERO/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
METHOD OF TRAINING MODEL, METHOD OF PREDICTING TRAJECTORY, AND ELECTRONIC DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD OF TRAINING MODEL, METHOD OF PREDICTING TRAJECTORY, AND ELECTRONIC DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email