Office Action Analysis: 18075791 — METHOD FOR DETERMINING SIMILAR SCENARIOS, TRAINING METHOD, AND TRAINING CONTROLLER

Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Responsive to communications on 03/31/2023
Claims 1-15 pending
Claims 1-15 rejected
Priority
Application data sheet received on 12/06/2022 claims foreign priority to application DE 10 2021 132 025 and EP 21212482 both filed on 2021-12-06. EP 21212482 certificate of availability received on 01/24/2023. application DE 10 2021 132 025 received on 03/31/2023. 

Application Data sheet accepted by the examiner. 
Information Disclosure Statement
IDS form received on 03/15/2023 reviewed and considered by the examiner. 
Drawings
Drawings received on 12/06/2022. Drawings accepted by the examiner. 
Specification
Abstract received on 12/06/2022 less than 150 words and contains no legal or implied phraseology. Abstract is accepted by the examiner. 
Specification received on 12/06/2022 accepted by the examiner. 
Claim Objections
Regarding claim set received on 12/06/2022

Claim 4 objected to because of the following informalities: Claim 4 references a fourth, fifth and sixth encoder. A first, second, and third encoder was introduced in claim 3, while claim 4 is only dependent on claim 1. Therefore, when claim 4 references the new limitations of a fourth, fifth and sixth encoder, it does not make sense outside the context of the first, second, and third encoders of claim 3. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1- 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claims 1 and 14 recite the limitation “an in particular dimension-reduced feature representation.” It is unclear how the “in particular” limitation impacts the scope of the claim in reference to the other “a dimension-reduced feature representation” which was not defined as “in particular.” The examiner is interpreting this limitation in the claim to mean that the two dimension reduced feature representations must be distinct from each other. 

Claim 5 recites the limitation " the fourth encoder, the fifth encoder, and the sixth encoder ".  There is insufficient antecedent basis for this limitation in the claim since these terms are introduced in claim 4 which claim 5 does not depend on. 

Claim 7 recites the limitation "the first to sixth encoders ." There is insufficient antecedent basis for this limitation in the claim since the encoders are introduced in claims 3 and 4 which claim 7 claim does not depend on. 

Claims 2-4, 6, 8-13, and 15 are rejected for their dependence on the above claims. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 10-11, 13 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over “EXTRACTING FEATURES FROM SENSOR DATA” US 20240312177 A1 (Redford_2021) and “SYSTEMS AND METHODS FOR ENCODING AND SEARCHING SCENARIO INFORMATION” US 20210403036 A1 (Danna_2020).

Claim 1:Redford_2021 makes obvious A computer-implemented method (par 15: “A first aspect herein is directed to a computer-implemented method of training an encoder together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets,”) to provide a machine learning algorithm (par 3 “ A broad application of ML is perception. Perception means the interpretation of sensor data of one or more modalities, such as image, radar and/or lidar. “) to determine similar scenarios (par 194: “ For the contrastive learning task, the first and second BEV images 1504A, 1504B of FIG. 15 are associated images corresponding to the same RGBD image 1502. The first and second images 1504A, 1504B therefore constitute a positive pair, as depicted in the top part of FIG. 15. BEV images that do not correspond to the same RGBD image constitute negative pairs. The bottom part of FIG. 16 depicts third and fourth BEV images 1504C, 1504D, which are not associated with each other or with the first and second images 1504A, 1504B. For the four BEV images 1504A,1504B, 1504C,1504D depicted in FIG. 16, there are five negative pairs: the first image 1504A paired with either one of the third and fourth images 1504C, 1504D, the second image 1504B paired with either one of those images 1504C, 1504D and the third and fourth images 1504C, 1504D paired with each other. The aim of the contrastive learning task is to identify positive pairs whilst distinguishing negative pairs. “ Examiner note: Where identifying positive pairs is interpreted to mean to determine similar scenarios. ) based on scenario data of a data set of sensor data, the method comprising: (par 15: “A first aspect herein is directed to a computer-implemented method of training an encoder together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets”)

providing the data set of sensor data of a drive, captured by a plurality of on-board environment detection sensors, by a vehicle; (par 152: “Reference numeral 1302 denotes a set of real sensor data captured using one or more physical sensors. The following examples consider sensor data captured from a sensor equipped vehicle such as image, lidar or radar data, or any combination of those modalities. The sensor data 1302 can be encoded in any suitable way, e.g., using an image, voxel, point cloud or surface mesh representation etc. or any combination thereof.”)

generating a first augmentation of the data set of sensor data and a second augmentation that is different from the first augmentation of the data set of sensor data;

par 184 – 187: In order to provide a paired training input, the original RGBD image (Examiner note: original data set) 1502 is passed to a 2D object detector 1506. The 2D object detector 1506 operates on one or more channels of the RGBD image 1502, such as the depth channel (D), the colour (RGB) channels or both. For the avoidance of doubt, the “2D” terminology refers to the architecture of the 2D object detector, which is designed to operate on dense, 2D image representations, and does not exclude the application of the 2D object detector to the depth channel or to a 3D image (in the above sense) more generally. In this example, the 2D object detector 1506 takes the form of a 2D bounding box detector that outputs a set of 2D bounding boxes 1508A, 1508B for a set of objects detected in the RGBD image 1502. This, in turn, allows object points, corresponding to pixels that are contained within one of the 2D bounding boxes 1508A, 1508B, to be distinguished from non-object point that correspond to pixels not contained within any 2D bounding box 1508A, 108B. A cropping component 1510 uses the 2D bounding boxes 1508A, 1508B to generate a “cropped” point cloud 1503B containing only object points. The cropped point cloud 1503B and the full point cloud 1503A of the same RGBD image 1502 constitute a positive pair for the purpose of contrastive learning.  .. par 202: As an alternative to using the original point cloud 1503A or its BEV image representation 1504A, two cropped or otherwise transformed point clouds/BEV images could be used, each with different background noise.” (Examiner note: a first and second augmentation)

applying a first machine learning algorithm to the first augmentation of the data set of sensor data for generating a dimension-reduced feature representation of the first augmentation of the data set of sensor data and for of the second augmentation of the data set of sensor data and for 

See figure 16 contrastive learning which depicts augmentations 1504A and 1504B entering an encoder (102) and then a projection layer (113). See also figure 4 which depicts this process.  Par 101: “ In this example, the encoder 102 has a CNN architecture. The local features extracted by the encoder 102 are encoded in a feature map 405, which is a second tensor having spatial dimensions X′×Y′ and F channels. The number of channels F is the dimensionality of the feature space. The size of the feature space F is large enough to provide rich feature representations. For example, of the order of a hundred channels might be used in practice though this is context dependent. There is no requirement for the spatial dimensions X′×Y′ of the feature map 405 to match the spatial dimensions X×Y if the image 104. If the encoder 102 is architected so that the spatial dimensions of the feature map 405 do equal those of the input image 104 (e.g., using upsampling), then each pixel of the feature map 405 uniquely corresponds to a pixel of the image 104 and is said to contain an F-dimensional feature vector for that pixel of the image 104. When X′<X and Y′<Y, then each pixel of the feature map 405 correspond to larger region of the image 104 that encompasses more than one pixel of the image 104.” Examiner note: dimensionality reduction) … par 106: “A pixel of the projection map 405 is denoted i and contains a P-dimensional vector v.sub.i (projected vector). Pixel i of the projection map 405 corresponds to a grid cell of the image 104-referred to as grid cell i for conciseness. Grid cell i is a single pixel of the original image 104 when the spatial dimensions of the projection map 405 match the original image 104 but is a multi-pixel grid cell if the projection map 405 has spatial dimensions less than the original image 104. In the following examples, the size of the projection space P=2. In training on the pretext regression task, the vector v.sub.i is interpreted as a vector lying in the BEV plane.” Examiner note: dimensionality reduction. 

 and applying an optimization algorithm to the feature representation output by the first machine learning algorithm of the first augmentation of the data set of sensor data, the optimization algorithm approximating the feature representation output by the second machine learning algorithm of the second augmentation of the data set of sensor data.

Par 166: “The SimCLR approach of Chen et al. can be applied with positive/negative image pairs generated in accordance with FIG. 3. Following the notation of Chen et al., a pretext training set is denoted {{tilde over (x)}.sub.k} and a positive pair of images is denoted {tilde over (x)}.sub.i,{tilde over (x)}.sub.j. The encoder 102 is represented mathematically as a function ƒ(⋅). For a CNN encoder architecture, ƒ typically involves a series of convolutions and non-linear transformations applied in accordance with the encoder weights w.sub.1. The output representation of the encoder 102 is denoted h.sub.i=ƒ({tilde over (x)}.sub.i) for a given input {tilde over (x)}.sub.i. The projection component 113 is implemented as small neural network projection head g(⋅) that transforms the representation into a space in which the contrastive loss 114 is applied (the projection space). The contrastive loss is defined between a given positive pair {tilde over (x)}.sub.i,{tilde over (x)}.sub.j in minibatch of 2N images as:
[00003]ℓi,j=-log⁢exp⁡(sim⁡(𝓏i,𝓏j)/τ).Math.k=12⁢N[k≠i]exp⁢(sim⁡(𝓏i,𝓏k)/τ),(1)
where z.sub.i=g(h.sub.i), τ is a constant, sim(u, v)=u.sup.Tv/∥u∥∥v∥ denotes the dot product between l.sub.2 normalized u and v and an indicator function [AltContent: rect] [k≠i] is 1 if k≠j and 0 otherwise. For pre-training, the loss is computed across all positive pairs in (Examiner note: the positive pairs are the first and second augmented images) {{tilde over (x)}.sub.k}, with the numerator in Equation (1) acting to encourage similarity of features between positively paired images {tilde over (x)}.sub.i, {tilde over (x)}.sub.j, and the denominator acting to discourage similarity of features between {tilde over (x)}.sub.i and all other images. The loss function of Equation 1 is a normalized temperature-scaled cross-entropy loss (NT-Xent). As will be appreciated, this is just one example of a viable contrastive loss that can be applied with paired images generated as per FIG. 13. Other contrastive learning approaches can be applied to paired images generated according to the present teaching.“ Examiner note: Where this process as described is interpreted as optimizing weights using a loss function to approximate the outputs between the positive pairs. 

    PNG
    media_image1.png
    724
    971
    media_image1.png
    Greyscale

Examiner note: figure 16 for applicant review. The top half “positive pair” of figure 16 depicts a simCLR workflow according to an embodiment of the prior art which matches the workflow of claim 1. 

Redford_2021 does not expressly recite 

Danna_2020 however makes determining a [first/second] class of a scenario 

par 55: “The language-based scenario search module 212 can be configured to associate scenarios with high-level primitives based on low-level parameters associated with the scenarios. The language-based scenario search module 212 can apply various rules on the low-level parameters associated with a scenario to determine whether the low-level parameters satisfy one or more conditions of a high-level primitive and, when the conditions are satisfied, associate the scenario with the high-level primitive. “Examiner note: Where associating the scenario based on the parameters to high level primitives is determining a class for the scenario. 

Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art. 

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ While one reasonably skilled in the art could recognize that the scenarios used by Redford_2021 are likely classified, Redford_2021 does not expressly recite that they are classified. Danna_2020 is a similar embedding model that uses a trained model to generate embeddings of scenarios and search for similar ones (See figure 4 which outlines this process). Danna_2020 organizes similar scenarios to solve an issue in the prior art for vehicle simulation, mainly locating relevant scenarios to use. Danna_2020 states Par 42: “While grouping scenarios based on taxonomy is helpful for organizational purposes, retrieving scenario information based on this approach can be challenging for a number of reasons. For example, assume that a human searcher wants to obtain information describing a scenario that involves a pedestrian at a four-way intersection with stop signs. The searcher may want to retrieve such information to perform a computer-based simulation of a vehicle that virtually experiences the scenario, for example, for purposes of testing the vehicle's response to the scenario. Scenarios can be identified and included in a simulation suite or selection comprising the identified scenarios. In this example, before relevant scenarios can be obtained, the searcher needs to understand the taxonomy under which scenarios were categorized and sub-categorized. Based on the searcher's understanding of the taxonomy, the searcher can conduct a search for scenarios of interest based on a particular combination of categories and sub-categories. However, if the searcher is not fully familiar with the taxonomy, the searcher may inadvertently miss scenarios that may be of interest by overlooking relevant categories and sub-categories under which those scenarios are organized. Further, even if the searcher has full knowledge of the taxonomy, the searcher may still not be able to retrieve relevant scenarios if the scenarios were improperly categorized … Thus, an improved approach that indexes or maintains scenario data of different types of scenarios that negates the need for the developers and searchers to keep up with the taxonomy structure is desired.”

Therefore, it would have been obvious to the data augmentation and workflow of Redford_2021  with scenario classification of Danna_2020  for the benefit of obtaining relevant scenarios for simulation and testing to obtain the invention as specified in the claims. 

Claim 2:Redford_2021  makes obvious The computer-implemented method according to claim 1, wherein a similarity loss between the first second machine learning algorithm of the scenario covered by the second augmentation of the data set of sensor data, is minimized by the optimization algorithm. 
Par 166: “The SimCLR approach of Chen et al. can be applied with positive/negative image pairs generated in accordance with FIG. 3. Following the notation of Chen et al., a pretext training set is denoted {{tilde over (x)}.sub.k} and a positive pair of images is denoted (Examiner note: {tilde over (x)}.sub.i,{tilde over (x)}.sub.j. The encoder 102 is represented mathematically as a function ƒ(⋅). For a CNN encoder architecture, ƒ typically involves a series of convolutions and non-linear transformations applied in accordance with the encoder weights w.sub.1. The output representation of the encoder 102 is denoted h.sub.i=ƒ({tilde over (x)}.sub.i) for a given input {tilde over (x)}.sub.i. The projection component 113 is implemented as small neural network projection head g(⋅) that transforms the representation into a space in which the contrastive loss 114 is applied (the projection space). The contrastive loss is defined between a given positive pair {tilde over (x)}.sub.i,{tilde over (x)}.sub.j in minibatch of 2N images as: (Examiner note: minimization by an optimization algorithm)
[00003]ℓi,j=-log⁢exp⁡(sim⁡(𝓏i,𝓏j)/τ).Math.k=12⁢N[k≠i]exp⁢(sim⁡(𝓏i,𝓏k)/τ),(1)
where z.sub.i=g(h.sub.i), τ is a constant, sim(u, v)=u.sup.Tv/∥u∥∥v∥ denotes the dot product between l.sub.2 normalized u and v and an indicator function [AltContent: rect] [k≠i] is 1 if k≠j and 0 otherwise. For pre-training, the loss is computed across all positive pairs in {{tilde over (x)}.sub.k}, with the numerator in Equation (1) acting to encourage similarity of features between positively paired images {tilde over (x)}.sub.i, {tilde over (x)}.sub.j, and the denominator acting to discourage similarity of features between {tilde over (x)}.sub.i and all other images. The loss function of Equation 1 is a normalized temperature-scaled cross-entropy loss (NT-Xent). As will be appreciated, this is just one example of a viable contrastive loss that can be applied with paired images generated as per FIG. 13. “ Examiner note: See also figure 16 that shows loss (114) which depicts the simCLR approach

Redford_2021 does not expressly recite class
Danna_2020, however, makes obvious class (par 55: “The language-based scenario search module 212 can be configured to associate scenarios with high-level primitives based on low-level parameters associated with the scenarios. The language-based scenario search module 212 can apply various rules on the low-level parameters associated with a scenario to determine whether the low-level parameters satisfy one or more conditions of a high-level primitive and, when the conditions are satisfied, associate the scenario with the high-level primitive. “Examiner note: Where associating the scenario based on the parameters to high level primitives is determining a class for the scenario.)

As already stated Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow the same teaching and motivation proposed in the art as in claim 1. 

Therefore, it would have been obvious to the data augmentation and workflow of Redford_2021  with scenario classification of Danna_2020  for the benefit of obtaining relevant scenarios for simulation and testing to obtain the invention as specified in the claims. 

Claim 10:
Redford_2021 makes obvious The computer-implemented method according to claim 1, wherein the first augmentation and the second augmentation for creating different variants of the data set of sensor data (see claim 1) are randomly generated.  par 202: As an alternative to using the original point cloud 1503A or its BEV image representation 1504A, two cropped or otherwise transformed point clouds/BEV images could be used, each with different background noise.” (Examiner note: where the noise is random, see par 65: “FIG. 15A shows a block schematic block diagram of a pair generation function implemented via cropping of background point and injection of random background noise;”

Claim 11:Redford_2021 makes obvious The computer-implemented method according to claim 1, (see claim 1)

Redford_2021 does not expressly recite wherein the scenarios have driving maneuvers of the vehicle and/or a fellow vehicle and/or interaction maneuvers of the vehicle with the fellow vehicle and/or further objects.

However, Danna_2020 makes obvious wherein the scenarios have driving maneuvers of the vehicle and/or a fellow vehicle and/or interaction maneuvers of the vehicle with the fellow vehicle and/or further objects. Par 9: “ In an embodiment, the image data can be a raster of the at least one example scenario that includes at least one trajectory associated with the one or more vehicles, one or more respective trajectories associated with one or more agents, and map data.”

As already stated Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art.  

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ Danna_2020 par 43 explains how using trajectories in determining similar scenarios is relevant for researchers who wish to simulate certain scenarios. “FIG. 1A illustrates an example scenario 100 in which a searcher faces various shortcomings of the conventional approaches. The example scenario 100 can be a scenario for which the searcher wishes to discover similar scenarios to include in a simulation suite of computer-based simulations to test a vehicle's response to those scenarios. Assume that the searcher is interested in simulation cases where a vehicle 110a cuts in front of another vehicle 108a, as illustrated in the example scenario 100. The example scenario 100 illustrates three vehicles 108a, 110a, and 112a navigating toward an intersection 102. The intersection 102 has at least one crosswalk 104 and at least one stop sign 106 to control oncoming traffic. The vehicles 108a, 110a, and 112a navigate toward the intersection 102 based on their respective trajectories 108b, 110b, 112b. In this example, the searcher may be interested in similar scenarios that involve vehicles performing a cut-in trajectory similar to the cut-in trajectory 110b of the vehicle 110a relative to the trajectory 108b of the vehicle 108a. In this regard, the searcher may retrieve scenarios in a sub-category of scenarios which include a four-way intersection with stop signs. However, the searcher may inadvertently fail to retrieve additional scenarios of interest that occur at a four-way intersection with traffic lights, because these scenarios are included in a different sub-category which the searcher overlooked. As a result, any computer-based simulations involving scenarios that occur at four-way intersections may be inaccurate or incomplete.”

Therefore, it would have been obvious to combine the data augmentation and workflow of Redford_2021  with the encoding of trajectory and map data of Danna_2020 for the benefit of allowing researchers to identify similar scenarios based on trajectory and path to obtain the invention as specified in the claims.	

As already stated Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art.  

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ Danna_2020 par 43 explains how using trajectories in determining similar scenarios is relevant for researchers who wish to simulate certain scenarios. “FIG. 1A illustrates an example scenario 100 in which a searcher faces various shortcomings of the conventional approaches. The example scenario 100 can be a scenario for which the searcher wishes to discover similar scenarios to include in a simulation suite of computer-based simulations to test a vehicle's response to those scenarios. Assume that the searcher is interested in simulation cases where a vehicle 110a cuts in front of another vehicle 108a, as illustrated in the example scenario 100. The example scenario 100 illustrates three vehicles 108a, 110a, and 112a navigating toward an intersection 102. The intersection 102 has at least one crosswalk 104 and at least one stop sign 106 to control oncoming traffic. The vehicles 108a, 110a, and 112a navigate toward the intersection 102 based on their respective trajectories 108b, 110b, 112b. In this example, the searcher may be interested in similar scenarios that involve vehicles performing a cut-in trajectory similar to the cut-in trajectory 110b of the vehicle 110a relative to the trajectory 108b of the vehicle 108a. In this regard, the searcher may retrieve scenarios in a sub-category of scenarios which include a four-way intersection with stop signs. However, the searcher may inadvertently fail to retrieve additional scenarios of interest that occur at a four-way intersection with traffic lights, because these scenarios are included in a different sub-category which the searcher overlooked. As a result, any computer-based simulations involving scenarios that occur at four-way intersections may be inaccurate or incomplete.”

Therefore, it would have been obvious to combine the data augmentation and workflow of Redford_2021  with the encoding of trajectory and map data of Danna_2020 for the benefit of allowing researchers to identify similar scenarios based on trajectory and path to obtain the invention as specified in the claims.	

Claim 13:Redford_2021 makes obvious A computer-implemented method(par 15: “A first aspect herein is directed to a computer-implemented method of training an encoder together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets,”) to determine similar scenarios (par 194: “ For the contrastive learning task, the first and second BEV images 1504A, 1504B of FIG. 15 are associated images corresponding to the same RGBD image 1502. The first and second images 1504A, 1504B therefore constitute a positive pair, as depicted in the top part of FIG. 15. BEV images that do not correspond to the same RGBD image constitute negative pairs. The bottom part of FIG. 16 depicts third and fourth BEV images 1504C, 1504D, which are not associated with each other or with the first and second images 1504A, 1504B. For the four BEV images 1504A,1504B, 1504C,1504D depicted in FIG. 16, there are five negative pairs: the first image 1504A paired with either one of the third and fourth images 1504C, 1504D, the second image 1504B paired with either one of those images 1504C, 1504D and the third and fourth images 1504C, 1504D paired with each other. The aim of the contrastive learning task is to identify positive pairs whilst distinguishing negative pairs. “ Examiner note: Where identifying positive pairs is interpreted to mean to determine similar scenarios. )  based on scenario data of a data set of sensor data, the method comprising: (par 15: “A first aspect herein is directed to a computer-implemented method of training an encoder together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets”)

providing the data set of sensor data of a drive, captured by a plurality of on-board environment detection sensors, by a vehicle;  (par 152: “Reference numeral 1302 denotes a set of real sensor data captured using one or more physical sensors. The following examples consider sensor data captured from a sensor equipped vehicle such as image, lidar or radar data, or any combination of those modalities. The sensor data 1302 can be encoded in any suitable way, e.g., using an image, voxel, point cloud or surface mesh representation etc. or any combination thereof.”)

Redford_2021 does not expressly recite and applying a machine learning algorithm trained according to claim 1 to the data set of sensor data for determining clustering similar scenarios.

Danna_2020 however, makes obvious and applying a machine learning algorithm trained according to claim 1 to the data set of sensor data (see claim 1 for mapping)  for determining clustering similar scenarios. (par 84: “Since the trajectories of agents are measured over a period of time, the low-level parameters are associated with temporal and spatial aspects (e.g., position, velocity, acceleration, or the like). The annotation module 802 can analyze the low-level parameters to determine whether a particular group of low-level parameters and their corresponding values satisfy an annotation rule of a high-level primitive. If the annotation rule is satisfied, the high-level primitive can be used to search for the scenario associated with the particular group of low-level parameters in lieu of searching for the particular group of low-level parameters and their corresponding values. Examiner note: where this paragraph describes determining clustering in a high level grouping sense. Where the clusters are grouped based on high-level parameters. 

Par 4: “In an embodiment, the embedding of the at least one representation of the at least one example scenario can be generated within a vector space, and the embedding representing the at least one scenario can be included within the vector space. In an embodiment, the identifying the at least one scenario can further comprise determining that a threshold distance within the vector space between the embedding of the at least one representation of the at least one scenario and the embedding representing the at least one example scenario is satisfied. “ Examiner note: Where this paragraph describes clustering in a machine learning sense based on distances in a vector space. 

Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art. 

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ Redford_2021 also does this comparison through the use of vectors. Danna_2020 is a similar embedding model that uses a trained model to generate embeddings of scenarios and search for similar ones (See figure 4 which outlines this process). Danna_2020 organizes similar scenarios to solve an issue in the prior art for vehicle simulation, mainly locating relevant scenarios to use through vectors. Danna_2020 states Par 42: “While grouping scenarios based on taxonomy is helpful for organizational purposes, retrieving scenario information based on this approach can be challenging for a number of reasons. For example, assume that a human searcher wants to obtain information describing a scenario that involves a pedestrian at a four-way intersection with stop signs. The searcher may want to retrieve such information to perform a computer-based simulation of a vehicle that virtually experiences the scenario, for example, for purposes of testing the vehicle's response to the scenario. Scenarios can be identified and included in a simulation suite or selection comprising the identified scenarios. In this example, before relevant scenarios can be obtained, the searcher needs to understand the taxonomy under which scenarios were categorized and sub-categorized. Based on the searcher's understanding of the taxonomy, the searcher can conduct a search for scenarios of interest based on a particular combination of categories and sub-categories. However, if the searcher is not fully familiar with the taxonomy, the searcher may inadvertently miss scenarios that may be of interest by overlooking relevant categories and sub-categories under which those scenarios are organized. Further, even if the searcher has full knowledge of the taxonomy, the searcher may still not be able to retrieve relevant scenarios if the scenarios were improperly categorized … Thus, an improved approach that indexes or maintains scenario data of different types of scenarios that negates the need for the developers and searchers to keep up with the taxonomy structure is desired.”

Therefore, it would have been obvious to the data augmentation and workflow of Redford_2021  with scenario clustering of Danna_2020  for the benefit of obtaining relevant scenarios for simulation and testing to obtain the invention as specified in the claims. 

Claim 14:
A training controller (par 15: “A first aspect herein is directed to a computer-implemented method of training an encoder together with a perception component (Examiner note: where a computer training an encoder encompasses a training controller) based on a training set comprising unannotated sensor data sets and annotated sensor data sets,”) to provide a machine learning algorithm (par 3 “ A broad application of ML is perception. Perception means the interpretation of sensor data of one or more modalities, such as image, radar and/or lidar. “)  to determine similar scenarios (par 194: “ For the contrastive learning task, the first and second BEV images 1504A, 1504B of FIG. 15 are associated images corresponding to the same RGBD image 1502. The first and second images 1504A, 1504B therefore constitute a positive pair, as depicted in the top part of FIG. 15. BEV images that do not correspond to the same RGBD image constitute negative pairs. The bottom part of FIG. 16 depicts third and fourth BEV images 1504C, 1504D, which are not associated with each other or with the first and second images 1504A, 1504B. For the four BEV images 1504A,1504B, 1504C,1504D depicted in FIG. 16, there are five negative pairs: the first image 1504A paired with either one of the third and fourth images 1504C, 1504D, the second image 1504B paired with either one of those images 1504C, 1504D and the third and fourth images 1504C, 1504D paired with each other. The aim of the contrastive learning task is to identify positive pairs whilst distinguishing negative pairs. “ Examiner note: Where identifying positive pairs is interpreted to mean to determine similar scenarios. )  based on scenario data of a data set of sensor data, the training controller comprising: (par 15: “A first aspect herein is directed to a computer-implemented method of training an encoder together with a perception component based on a training set comprising unannotated sensor data sets and annotated sensor data sets”)

a receiver to receive the data set of sensor data of a drive captured by a plurality of on-board environment detection sensors by a vehicle; (par 152: “Reference numeral 1302 denotes a set of real sensor data captured using one or more physical sensors. The following examples consider sensor data captured from a sensor equipped vehicle such as image, lidar or radar data, or any combination of those modalities. The sensor data 1302 can be encoded in any suitable way, e.g., using an image, voxel, point cloud or surface mesh representation etc. or any combination thereof.”) Examiner note: Where this information being received inherently discloses that there is a receiver that receives this data. 

 a generator to generate a first augmentation of the data set of sensor data and a second augmentation, different from the first augmentation, of the data set of sensor data; par 184 – 187: In order to provide a paired training input, the original RGBD image (Examiner note: original data set) 1502 is passed to a 2D object detector 1506. The 2D object detector 1506 operates on one or more channels of the RGBD image 1502, such as the depth channel (D), the colour (RGB) channels or both. For the avoidance of doubt, the “2D” terminology refers to the architecture of the 2D object detector, which is designed to operate on dense, 2D image representations, and does not exclude the application of the 2D object detector to the depth channel or to a 3D image (in the above sense) more generally. In this example, the 2D object detector 1506 takes the form of a 2D bounding box detector that outputs a set of 2D bounding boxes 1508A, 1508B for a set of objects detected in the RGBD image 1502. This, in turn, allows object points, corresponding to pixels that are contained within one of the 2D bounding boxes 1508A, 1508B, to be distinguished from non-object point that correspond to pixels not contained within any 2D bounding box 1508A, 108B. A cropping component (examiner note: the generator) 1510 uses the 2D bounding boxes 1508A, 1508B to generate a “cropped” point cloud 1503B containing only object points. The cropped point cloud 1503B and the full point cloud 1503A of the same RGBD image 1502 constitute a positive pair for the purpose of contrastive learning.  .. par 202: As an alternative to using the original point cloud 1503A or its BEV image representation 1504A, two cropped or otherwise transformed point clouds/BEV images could be used, each with different background noise.” (Examiner note: a first and second augmentation) 

 a first applicator to apply a first machine learning algorithm to the first augmentation of the data set of sensor data for generating an in particular dimension-reduced feature representation of the first augmentation of the data set of sensor data and to See figure 16 contrastive learning which depicts augmentations 1504A and 1504B entering an encoder (102) and then a projection layer (113). See also figure 4 which depicts this process.  Par 101: “ In this example, the encoder 102 has a CNN architecture. The local features extracted by the encoder 102 are encoded in a feature map 405, which is a second tensor having spatial dimensions X′×Y′ and F channels. The number of channels F is the dimensionality of the feature space. The size of the feature space F is large enough to provide rich feature representations. For example, of the order of a hundred channels might be used in practice though this is context dependent. There is no requirement for the spatial dimensions X′×Y′ of the feature map 405 to match the spatial dimensions X×Y if the image 104. If the encoder 102 is architected so that the spatial dimensions of the feature map 405 do equal those of the input image 104 (e.g., using upsampling), then each pixel of the feature map 405 uniquely corresponds to a pixel of the image 104 and is said to contain an F-dimensional feature vector for that pixel of the image 104. When X′<X and Y′<Y, then each pixel of the feature map 405 correspond to larger region of the image 104 that encompasses more than one pixel of the image 104.” Examiner note: dimensionality reduction) … par 106: “A pixel of the projection map 405 is denoted i and contains a P-dimensional vector v.sub.i (projected vector). Pixel i of the projection map 405 corresponds to a grid cell of the image 104-referred to as grid cell i for conciseness. Grid cell i is a single pixel of the original image 104 when the spatial dimensions of the projection map 405 match the original image 104 but is a multi-pixel grid cell if the projection map 405 has spatial dimensions less than the original image 104. In the following examples, the size of the projection space P=2. In training on the pretext regression task, the vector v.sub.i is interpreted as a vector lying in the BEV plane.” Examiner note: dimensionality reduction. Where the encoder or projection layers are interpreted as applicators. 

 and a third applicator to apply an optimization algorithm to the feature representation output by the first machine learning algorithm of the first augmentation of the data set of sensor data, wherein the optimization algorithm approximates the feature representation output by the second machine learning algorithm of the second augmentation of the data set of sensor data. Par 166: “The SimCLR approach of Chen et al. can be applied with positive/negative image pairs generated in accordance with FIG. 3. Following the notation of Chen et al., a pretext training set is denoted {{tilde over (x)}.sub.k} and a positive pair of images is denoted {tilde over (x)}.sub.i,{tilde over (x)}.sub.j. The encoder 102 is represented mathematically as a function ƒ(⋅). For a CNN encoder architecture, ƒ typically involves a series of convolutions and non-linear transformations applied in accordance with the encoder weights w.sub.1. The output representation of the encoder 102 is denoted h.sub.i=ƒ({tilde over (x)}.sub.i) for a given input {tilde over (x)}.sub.i. The projection component 113 is implemented as small neural network projection head g(⋅) that transforms the representation into a space in which the contrastive loss 114 is applied (the projection space). The contrastive loss is defined between a given positive pair {tilde over (x)}.sub.i,{tilde over (x)}.sub.j in minibatch of 2N images as:
[00003]ℓi,j=-log⁢exp⁡(sim⁡(𝓏i,𝓏j)/τ).Math.k=12⁢N[k≠i]exp⁢(sim⁡(𝓏i,𝓏k)/τ),(1)
where z.sub.i=g(h.sub.i), τ is a constant, sim(u, v)=u.sup.Tv/∥u∥∥v∥ denotes the dot product between l.sub.2 normalized u and v and an indicator function [AltContent: rect] [k≠i] is 1 if k≠j and 0 otherwise. For pre-training, the loss is computed across all positive pairs in (Examiner note: the positive pairs are the first and second augmented images) {{tilde over (x)}.sub.k}, with the numerator in Equation (1) acting to encourage similarity of features between positively paired images {tilde over (x)}.sub.i, {tilde over (x)}.sub.j, and the denominator acting to discourage similarity of features between {tilde over (x)}.sub.i and all other images. The loss function of Equation 1 is a normalized temperature-scaled cross-entropy loss (NT-Xent). As will be appreciated, this is just one example of a viable contrastive loss that can be applied with paired images generated as per FIG. 13. Other contrastive learning approaches can be applied to paired images generated according to the present teaching.“ Examiner note: Where this process as described is interpreted as optimizing weights using a loss function (the optimization algorithm) to approximate the outputs between the positive pairs. 

    PNG
    media_image1.png
    724
    971
    media_image1.png
    Greyscale

Examiner note: figure 16 for applicant review. The top half “positive pair” of figure 16 depicts a simCLR workflow according to an embodiment of the prior art which matches the workflow of claim 1. 

Redford_2021 does not expressly recite determine a first/second class of a scenario
Danna_2020 however makes obvious determining a [first/second] class of a scenario
par 55: “The language-based scenario search module 212 can be configured to associate scenarios with high-level primitives based on low-level parameters associated with the scenarios. The language-based scenario search module 212 can apply various rules on the low-level parameters associated with a scenario to determine whether the low-level parameters satisfy one or more conditions of a high-level primitive and, when the conditions are satisfied, associate the scenario with the high-level primitive. “Examiner note: Where associating the scenario based on the parameters to high level primitives is determining a class for the scenario. 

Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art. 

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ While one reasonably skilled in the art could recognize that the scenarios used by Redford_2021 are likely classified, Redford_2021 does not expressly recite that they are classified. Danna_2020 is a similar embedding model that uses a trained model to generate embeddings of scenarios and search for similar ones (See figure 4 which outlines this process). Danna_2020 organizes similar scenarios to solve an issue in the prior art for vehicle simulation, mainly locating relevant scenarios to use. Danna_2020 states Par 42: “While grouping scenarios based on taxonomy is helpful for organizational purposes, retrieving scenario information based on this approach can be challenging for a number of reasons. For example, assume that a human searcher wants to obtain information describing a scenario that involves a pedestrian at a four-way intersection with stop signs. The searcher may want to retrieve such information to perform a computer-based simulation of a vehicle that virtually experiences the scenario, for example, for purposes of testing the vehicle's response to the scenario. Scenarios can be identified and included in a simulation suite or selection comprising the identified scenarios. In this example, before relevant scenarios can be obtained, the searcher needs to understand the taxonomy under which scenarios were categorized and sub-categorized. Based on the searcher's understanding of the taxonomy, the searcher can conduct a search for scenarios of interest based on a particular combination of categories and sub-categories. However, if the searcher is not fully familiar with the taxonomy, the searcher may inadvertently miss scenarios that may be of interest by overlooking relevant categories and sub-categories under which those scenarios are organized. Further, even if the searcher has full knowledge of the taxonomy, the searcher may still not be able to retrieve relevant scenarios if the scenarios were improperly categorized … Thus, an improved approach that indexes or maintains scenario data of different types of scenarios that negates the need for the developers and searchers to keep up with the taxonomy structure is desired.”

Therefore, it would have been obvious to the data augmentation and workflow of Redford_2021  with scenario classification of Danna_2020  for the benefit of obtaining relevant scenarios for simulation and testing to obtain the invention as specified in the claims. 

Claim 15:
Redford_2021 makes obvious A computer program with a program code to perform the method according to claim 1 (see claim 1), when the computer program is executed on a computer. (par 212: “References herein to components, functions, modules and the like, denote functional components of a computer system which may be implemented at the hardware level in various ways. This includes the encoder 102, the projection layer(s) 113, the task-specific layer(s) 902, the training component 906 and the other components depicted in FIGS. 1 and 9 (among others). Such components may be implemented in a suitably configured computer system. A computer system comprises one or more computers that may be programmable (Examiner note: where programmable implies program code) or non-programmable. A computer comprises one or more processors which carry out the functionality of the aforementioned functional components. A processor can take the form of a general-purpose processor such as a CPU (Central Processing unit) or accelerator (e.g. GPU) etc. or more specialized form of hardware processor such as an FPGA (Field Programmable Gate Array) or ASIC (Application-Specific Integrated Circuit). That is, a processor may be programmable (e.g. an instruction-based general-purpose processor, FPGA etc.)”)

Claims 3-6, 8 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Redford_2021, Danna_2020, and US 20210125076 A1 “SYSTEM FOR PREDICTING AGGRESSIVE DRIVING” (Zhang_2021)

Claim 3:Redford_2021 makes obvious The computer-implemented method according to claim 1, wherein the first machine learning algorithm has (see claim 1)
a
a 

[encoder which receives BEV images]

Par 194: “ Each BEV image 1504A, 1504B, 1504C, 1504D is processed by the encoder 102 based on the encoder weights w.sub.1 in order to extract a set of features therefrom. In the third approach, the contrastive learning loss 114 is defined so as to encourage similarity of features between positively paired images”

Redford_2021 does not expressly recite a first 
a second 
and a third 

Danna_2020  however makes obvious a 

Par 9: “In an embodiment, the image data can be a raster of the at least one example scenario that includes at least one trajectory associated with the one or more vehicles, one or more respective trajectories associated with one or more agents, and map data.” 
Par 44: “An improved approach in accordance with the present technology overcomes the foregoing and other disadvantages associated with conventional approaches. In various embodiments, a machine learning technique can be used to determine similar scenarios. For example, a model can be trained to generate embeddings in a low-dimension vector space based on images representing scenarios. For example, an embedding can be generated based on an encoded image of a given scenario that was encountered by a vehicle while navigating an environment. The encoded image may be a bird's-eye view (BEV) of the scenario and can be generated based on various sensor data, such as point clouds produced by LiDAR sensors of the vehicle. (Examiner note: where the encoded image being generated implies an encoder) . In this example, the encoded image can depict the environment in which the scenario occurred and one or more agents present within the environment. In some embodiments, the encoded image can further depict movement information (e.g., trajectories) of the one or more agents over a period of time. For example, an agent can be assigned a color and its trajectory can be depicted with varying grades of the assigned color. The encoded image can further include semantic map information including, but not limited to, roads and their intended directions of travel. The semantic map information can also be encoded with colors and color grading (or contrasts). For example, an intended direction of travel of a road from point A to point B can be encoded with a colored line and the reverse direction from point B to point A can be encoded with a different colored line. The encoded image can be a raster image (e.g., a bitmap image). 

As already stated Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching / classification for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art.  

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ Danna_2020 par 43 explains how using trajectories in determining similar scenarios is relevant for researchers who wish to simulate certain scenarios. “FIG. 1A illustrates an example scenario 100 in which a searcher faces various shortcomings of the conventional approaches. The example scenario 100 can be a scenario for which the searcher wishes to discover similar scenarios to include in a simulation suite of computer-based simulations to test a vehicle's response to those scenarios. Assume that the searcher is interested in simulation cases where a vehicle 110a cuts in front of another vehicle 108a, as illustrated in the example scenario 100. The example scenario 100 illustrates three vehicles 108a, 110a, and 112a navigating toward an intersection 102. The intersection 102 has at least one crosswalk 104 and at least one stop sign 106 to control oncoming traffic. The vehicles 108a, 110a, and 112a navigate toward the intersection 102 based on their respective trajectories 108b, 110b, 112b. In this example, the searcher may be interested in similar scenarios that involve vehicles performing a cut-in trajectory similar to the cut-in trajectory 110b of the vehicle 110a relative to the trajectory 108b of the vehicle 108a. In this regard, the searcher may retrieve scenarios in a sub-category of scenarios which include a four-way intersection with stop signs. However, the searcher may inadvertently fail to retrieve additional scenarios of interest that occur at a four-way intersection with traffic lights, because these scenarios are included in a different sub-category which the searcher overlooked. As a result, any computer-based simulations involving scenarios that occur at four-way intersections may be inaccurate or incomplete.”

Therefore, it would have been obvious to combine the data augmentation and workflow of Redford_2021  with the encoding of trajectory and map data of Danna_2020 for the benefit of allowing researchers to identify similar scenarios based on trajectory and path to obtain the invention as specified in the claims.	

While one reasonably skilled in the art can likely understand that the encoder used in Danna_2020 are different encoders, Redford_2021 and Danna_2020 do not expressly recite first …  second … third encoder

Zhang_2021 however makes obvious first …  second … third encoder (par 138: “In the example speed prediction model 501 shown in FIG. 6, the first encoder 606a may be used to extract features from the traffic environment data. The second encoder 606b may be used to extract features of the vehicle data acquired by vehicle sensors, where such features may not include vehicle speed data. The third encoder 606c may be used to extract features from the vehicle speed data.”) Examiner note: Where Zhang_2021 also discusses the use of an encoder to get the speed of the vehicle. 

Redford_2021,  Danna_2020 and Zhang_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle scenario classification. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021,  Danna_2020 and Zhang_2021. The rationale for doing so would have been to follow a teaching or motivation proposed in the art. 

The user of Zhang_2021 uses multiple encoders in order to process the different types of sensor data. Zhang_2021 par 137 states “Another difference between the seq2seq model used by the speed prediction model 501 and conventional seq2seq models may be the number of encoders used by the speed prediction model 501. The example speed prediction model 501 in FIG. 6 may use three encoders, that is, a first encoder 606a, a second encoder 606b, and a third encoder 606c arranged in parallel.” … par 145 states “Another difference between the seq2seq model used by the speed prediction model 501 and conventional seq2seq models may be an amount of different data types (e.g., 614a, 614b, and 614c) used as inputs to the intermediate vector 616. While a conventional seq2seq model may be limited to one type of data being input to the intermediate vector, the speed prediction model 501 may input more than one data type into an intermediate vector 616.”

The user of Danna_2020 processes multiple different types of data in their classification processing. Therefore, it would have been obvious to combine the augmented training workflow of Redford_2021 and Danna_2020 which uses encoders to process vehicle sensor information with Zhang_2021 model of having multiple encoders to process different types of sensor information for the benefit of allowing parallel processing and usage of different data types as inputs for classification to obtain the invention as specified in the claims.	

Zhang_2021 however, makes obvious par 138: “The third encoder 606c may be used to extract features from the vehicle speed data.”

As already stated Redford_2021,  Danna_2020 and Zhang_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle scenario classification. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021,  Danna_2020 and Zhang_2021. The rationale for doing so would have been to follow a teaching or motivation proposed in the art. 

Danna_2020 uses the embeddings to map the “low level” and “high level” parameters to classify scenarios, see. par 45: “ In some instances, the non-image query can additionally specify other search parameters including parameters specifying temporal aspects and spatial aspects that include movements of an ego (e.g., an autonomous or semi-autonomous vehicle) or various agents. “ … Par 66: “The low-level parameters can comprise one or more classifications associated with the ego or various agents (Examiner note: the ego is the vehicle sensor) including, for example, agent type such as ego, pedestrian, cyclist, truck, or the like. In some instances, the low-level parameters can comprise metrics relating to temporal metrics (e.g., time or speed), spatial metrics (e.g., position or distance), or a combination of both (e.g., velocity or acceleration) associated with the ego or various agents.” … par 66: “ The first vehicle 758a may be an ego and the ego can be collecting data relating to the example scenario 750. Some of the low-level parameters that the first vehicle 758a (e.g., the ego) collects can be “ego distance to an agent”, “ego hard braking” based on deceleration, or the like.” As outlined, Danna_2020 classifies scenarios partially on the egos deceleration (Examiner note ie: speed data). When classifying the scenarios using vectors, the user of Danna_2020 would be motivated to embed the speed data of the vehicle to allow for accurate classification of that information when performing a high level query search. 

Therefore, it would have been obvious to combine the augmented training workflow of Redford_2021 and Danna_2020 which uses encoders to process vehicle sensor information with Zhang_2021 model of having multiple encoders, with specifically an encoder to extract features relating to speed of the ego vehicle to allow for classification based on the ego vehicles speed to obtain the invention as specified in the claims.	

Claim 4:
Claim 4 is effectively duplicate to claim 3, with the difference being that it pertains to the “second machine learning algorithm” as well as a fourth, fifth, and sixth, encoder. The second machine learning algorithm is mapped in claim 1. One ordinarily skilled in the art understands that the “fourth, fifth, and sixth” encoders of the second machine learning algorithm is the same as the first second and third encoders of claim 3 except that they apply to the second augmented data set rather than the first. One ordinarily skilled in the art would recognize that the combination of Redford_2021,  Danna_2020 and Zhang_2021 would apply to the second augmented data set that is being compared to the first. Because of these findings, claim 4 is rejected under the same rational of claim 3

Claim 5:
Redford_2021 makes obvious The computer-implemented method according to claim 3, (see claim 3) wherein the [first machine learning algorithm] 

Par 101: “In this example, the encoder 102 has a CNN architecture. The local features extracted by the encoder 102 are encoded in a feature map 405, which is a second tensor having spatial dimensions X′×Y′ and F channels. The number of channels F is the dimensionality of the feature space. The size of the feature space F is large enough to provide rich feature representations. For example, of the order of a hundred channels might be used in practice though this is context dependent. There is no requirement for the spatial dimensions X′×Y′ of the feature map 405 to match the spatial dimensions X×Y if the image 104. If the encoder 102 is architected so that the spatial dimensions of the feature map 405 do equal those of the input image 104 (e.g., using upsampling), then each pixel of the feature map 405 uniquely corresponds to a pixel of the image 104 and is said to contain an F-dimensional feature vector for that pixel of the image 104. When X′<X and Y′<Y, then each pixel of the feature map 405 correspond to larger region of the image 104 that encompasses more than one pixel of the image 104.” Examiner note: also see figure 16 which depicts this process occurring to both machine learning algorithms. 

Redford_2021 does not expressly recite first encoder, the second encoder, and the third encoder … which are concatenated … fourth encoder, the fifth encoder, and the sixth … which are concatenated 

Zhang_2021 however makes obvious first encoder, the second encoder, and the third encoder … which are concatenated … fourth encoder, the fifth encoder, and the sixth … which are concatenated [into a vector]

 Par 146: “The intermediate vector 616 encapsulates the information for all the input elements so that the decoder 620 can make accurate predictions. The intermediate vector 616 may be considered as the initial hidden state of the decoder 620. In making vehicle speed predictions, the intermediate vector 616 is a concatenation of two inputs, that is, (i) the revised features 614a, 614b, and 614c based on the short-term time-series data 602a, 602b, and 602c, and (ii) the long-term time-series data 604. As described above, the short-term time-series inputs include the revised environmental feature 614a from the first encoder 606a, the revised vehicle feature 614b from the second encoder 606b, and the revised speed feature 614c from the third encoder 606c. “

Redford_2021,  Danna_2020 and Zhang_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle scenario classification. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021,  Danna_2020 and Zhang_2021. The rationale for doing so would have been to follow a teaching or motivation proposed in the art. 
Zhang_2021 uses a data augmentation contrast learning workflow to train a model to extract features from sensor data. Zhang_2021 does this by encoding the images into vectors. Danna_2020 also classifies and searches scenarios based on embeddings in vector spaces. While one could interpret the inventor of Danna_2020 using multiple encoders, it is not explicitly recited in their specifications. Zhang_2021 explicitly recites the use of multiple encoders in order to process the different data types used for classification. Zhang_2021 par 137 states “Another difference between the seq2seq model used by the speed prediction model 501 and conventional seq2seq models may be the number of encoders used by the speed prediction model 501. The example speed prediction model 501 in FIG. 6 may use three encoders, that is, a first encoder 606a, a second encoder 606b, and a third encoder 606c arranged in parallel.” … par 145 states “Another difference between the seq2seq model used by the speed prediction model 501 and conventional seq2seq models may be an amount of different data types (e.g., 614a, 614b, and 614c) used as inputs to the intermediate vector 616. While a conventional seq2seq model may be limited to one type of data being input to the intermediate vector, the speed prediction model 501 may input more than one data type into an intermediate vector 616.” The user of Redford_2021 and Danna_2020 would be motivated to include multiple encoders to generate vectors in order to allow them to process different types of data for classification. In order to maintain the workflow of Redford_2021 and Danna_2020 as outlined above, the user of Redford_2021 and Danna_2020 would concatenate those vectors as taught by Zhang_2021 in order to allow for their current workflow of classification in vector space. 

Therefore, it would have been obvious to combine the augmented training workflow of Redford_2021 and Danna_2020 which uses encoders to process vehicle sensor information with Zhang_2021 model of having multiple encoders to process different types of sensor information and then combining those vector representations for the benefit of allowing parallel processing and usage of different data types as inputs for classification to obtain the invention as specified in the claims.	

Claim 6:
The computer-implemented method according to claim 5, (see claim 5) wherein the first machine learning algorithm (par 184 – 187: In order to provide a paired training input, the original RGBD image (Examiner note: original data set) 1502 is passed to a 2D object detector 1506. The 2D object detector 1506 operates on one or more channels of the RGBD image 1502, such as the depth channel (D), the colour (RGB) channels or both. For the avoidance of doubt, the “2D” terminology refers to the architecture of the 2D object detector, which is designed to operate on dense, 2D image representations, and does not exclude the application of the 2D object detector to the depth channel or to a 3D image (in the above sense) more generally. In this example, the 2D object detector 1506 takes the form of a 2D bounding box detector that outputs a set of 2D bounding boxes 1508A, 1508B for a set of objects detected in the RGBD image 1502. This, in turn, allows object points, corresponding to pixels that are contained within one of the 2D bounding boxes 1508A, 1508B, to be distinguished from non-object point that correspond to pixels not contained within any 2D bounding box 1508A, 108B. A cropping component 1510 uses the 2D bounding boxes 1508A, 1508B to generate a “cropped” point cloud 1503B containing only object points. The cropped point cloud 1503B and the full point cloud 1503A of the same RGBD image 1502 constitute a positive pair for the purpose of contrastive learning.  .. par 202: As an alternative to using the original point cloud 1503A or its BEV image representation 1504A, two cropped or otherwise transformed point clouds/BEV images could be used, each with different background noise.” (Examiner note: a first and second augmentation)
) using the (Par 101: “ In this example, the encoder 102 has a CNN architecture. The local features extracted by the encoder 102 are encoded in a feature map 405, which is a second tensor having spatial dimensions X′×Y′ and F channels. The number of channels F is the dimensionality of the feature space. The size of the feature space F is large enough to provide rich feature representations. For example, of the order of a hundred channels might be used in practice though this is context dependent. There is no requirement for the spatial dimensions X′×Y′ of the feature map 405 to match the spatial dimensions X×Y if the image 104. If the encoder 102 is architected so that the spatial dimensions of the feature map 405 do equal those of the input image 104 (e.g., using upsampling), then each pixel of the feature map 405 uniquely corresponds to a pixel of the image 104 and is said to contain an F-dimensional feature vector for that pixel of the image 104. When X′<X and Y′<Y, then each pixel of the feature map 405 correspond to larger region of the image 104 that encompasses more than one pixel of the image 104) and wherein the second machine learning algorithm data set of sensor data, using the see figure 15, the process described above is repeated for both image augmentations/ machine learning algorithms.)

    PNG
    media_image1.png
    724
    971
    media_image1.png
    Greyscale

Examiner note: The process depicts the two different augmented data sets of sensor data 1504A and 1504B. The process encodes this data into vectors, see also figure 4 which shows this process. The prior art of Redford_2021 than computes and minimizes a loss value. While the prior art of Redford_2021 deals with classification tasks, it does not expressly recite that a first/second class for the scenario is identified. Furthermore, while Redford_2021 in figure 4 implies a plurality of vectors vi, (see 406) Redford_2021 also does not expressly recite concatenating multiple vectors from different encoders.  

Redford_2021 does not expressly recite determines the first/second class of the scenario … 
Concatenated 

Danna_2020 however makes obvious determines the first/second class of the scenario  par 55: “The language-based scenario search module 212 can be configured to associate scenarios with high-level primitives based on low-level parameters associated with the scenarios. The language-based scenario search module 212 can apply various rules on the low-level parameters associated with a scenario to determine whether the low-level parameters satisfy one or more conditions of a high-level primitive and, when the conditions are satisfied, associate the scenario with the high-level primitive. “Examiner note: Where associating the scenario based on the parameters to high level primitives is determining a class for the scenario. 

Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art. 

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ While one reasonably skilled in the art could recognize that the scenarios used by Redford_2021 are likely classified, Redford_2021 does not expressly recite that they are classified. Danna_2020 is a similar embedding model that uses a trained model to generate embeddings of scenarios and search for similar ones (See figure 4 which outlines this process). Danna_2020 organizes similar scenarios to solve an issue in the prior art for vehicle simulation, mainly locating relevant scenarios to use. Danna_2020 states Par 42: “While grouping scenarios based on taxonomy is helpful for organizational purposes, retrieving scenario information based on this approach can be challenging for a number of reasons. For example, assume that a human searcher wants to obtain information describing a scenario that involves a pedestrian at a four-way intersection with stop signs. The searcher may want to retrieve such information to perform a computer-based simulation of a vehicle that virtually experiences the scenario, for example, for purposes of testing the vehicle's response to the scenario. Scenarios can be identified and included in a simulation suite or selection comprising the identified scenarios. In this example, before relevant scenarios can be obtained, the searcher needs to understand the taxonomy under which scenarios were categorized and sub-categorized. Based on the searcher's understanding of the taxonomy, the searcher can conduct a search for scenarios of interest based on a particular combination of categories and sub-categories. However, if the searcher is not fully familiar with the taxonomy, the searcher may inadvertently miss scenarios that may be of interest by overlooking relevant categories and sub-categories under which those scenarios are organized. Further, even if the searcher has full knowledge of the taxonomy, the searcher may still not be able to retrieve relevant scenarios if the scenarios were improperly categorized … Thus, an improved approach that indexes or maintains scenario data of different types of scenarios that negates the need for the developers and searchers to keep up with the taxonomy structure is desired.”

Therefore, it would have been obvious to the data augmentation and workflow of Redford_2021  with scenario classification of Danna_2020  for the benefit of obtaining relevant scenarios for simulation and testing to obtain the invention as specified in the claims. 

Redford_2021 and Danna_2020 do not expressly recite Concatenated

Zhang_2021 however, makes obvious Concatenated Par 146: “The intermediate vector 616 encapsulates the information for all the input elements so that the decoder 620 can make accurate predictions. The intermediate vector 616 may be considered as the initial hidden state of the decoder 620. In making vehicle speed predictions, the intermediate vector 616 is a concatenation of two inputs, that is, (i) the revised features 614a, 614b, and 614c based on the short-term time-series data 602a, 602b, and 602c, and (ii) the long-term time-series data 604. As described above, the short-term time-series inputs include the revised environmental feature 614a from the first encoder 606a, the revised vehicle feature 614b from the second encoder 606b, and the revised speed feature 614c from the third encoder 606c. “

Redford_2021,  Danna_2020 and Zhang_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle scenario classification. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021,  Danna_2020 and Zhang_2021. The rationale for doing so would have been to follow a teaching or motivation proposed in the art. 
Zhang_2021 uses a data augmentation contrast learning workflow to train a model to extract features from sensor data. Zhang_2021 does this by encoding the images into vectors. Danna_2020 also classifies and searches scenarios based on embeddings in vector spaces. While one could interpret the inventor of Danna_2020 using multiple encoders, it is not explicitly recited in their specifications. Zhang_2021 explicitly recites the use of multiple encoders in order to process the different data types used for classification. Zhang_2021 par 137 states “Another difference between the seq2seq model used by the speed prediction model 501 and conventional seq2seq models may be the number of encoders used by the speed prediction model 501. The example speed prediction model 501 in FIG. 6 may use three encoders, that is, a first encoder 606a, a second encoder 606b, and a third encoder 606c arranged in parallel.” … par 145 states “Another difference between the seq2seq model used by the speed prediction model 501 and conventional seq2seq models may be an amount of different data types (e.g., 614a, 614b, and 614c) used as inputs to the intermediate vector 616. While a conventional seq2seq model may be limited to one type of data being input to the intermediate vector, the speed prediction model 501 may input more than one data type into an intermediate vector 616.” The user of Redford_2021 and Danna_2020 would be motivated to include multiple encoders to generate vectors in order to allow them to process different types of data for classification. In order to maintain the workflow of Redford_2021 and Danna_2020 as outlined above, the user of Redford_2021 and Danna_2020 would concatenate those vectors as taught by Zhang_2021 in order to allow for their current workflow of classification in vector space. 

Therefore, it would have been obvious to combine the augmented training workflow of Redford_2021 and Danna_2020 which uses encoders to process vehicle sensor information with Zhang_2021 model of having multiple encoders to process different types of sensor information and then combining those vector representations for the benefit of allowing parallel processing and usage of different data types as inputs for classification to obtain the invention as specified in the claims.	

Claim 8:The computer-implemented method according to claim3, wherein trajectory data, covered by the data set of sensor data of the vehicle and/or of the object (see claim 3)

Redford_2021 does not expressly recite each have a different feature size depending on a number of time steps in which the object is located within a detection range of the plurality of on-board environment detection sensors.  

Zhang_2020 however, makes obvious each have a different feature size depending on a number of time steps (par 89: “ The system for predicting aggressive driving behavior 1 may construct a number of features separately for each time step t, (Examiner note: different feature size depending on number of time steps) with the features being concatenated over multiple time steps t.sub.n, to form the feature vectors for each of the vehicles 71. Such features in the vector may include relative location measure d.sub.t, vehicle motion measures such as velocity v.sub.t and acceleration a.sub.t, energy measure ! v.sub.t.sup.2, and force measure ! a.sub.t.”) in which the object is located within a detection range of the plurality of on-board environment detection sensors.   (par 133 :” The sensor array 50 installed at intersection 200 can detect the positions of the vehicles 71 within the detection range of the sensor array 50. The detected positions can then be mapped to global positions (e.g., as latitude and longitude values). The speed of a vehicle 71 can be measured by the detection and ranging sensor 54 (e.g., radar and/or lidar). In absence of data from the detection and ranging sensor 54, vehicle speed, as described above, can be calculated by time-series global positions of the vehicle 71.”)

As already stated Redford_2021, Danna_2020 and Zhang_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle scenario classification.  
Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021, Danna_2020 and Zhang_2021.
The rationale for doing so would have been to follow a teaching proposed in the art. 

Danna_2020 classifies scenarios using high level primitives, and uses time stamps to generate some of these classifications. For example, “The pixels in the encoded image can additionally capture temporal information. For example, the pixels can represent movements of the agents over a particular period of time, such as 3 seconds, 5 seconds, 10 seconds, or the like with graded colors or contrasts. A search query can specify the temporal aspects. For example, an image query can provide an image encoded with temporal information. For example, where 1 second is represented with a single graded color, a movement of a particular vehicle over 5 seconds can be represented with five grades of the color. By providing such an encoded image as an image query, the searcher can limit a search to scenarios represented over 5 seconds.”

Danna_2020 also discusses using encoded images to train a model such as is done by Redford_2021. Danna_2020 par 50 states “The encoded images can capture movements of the agents over a particular period of time, such as 3 seconds, 5 seconds, 10 seconds, or the like. The encoded images, as they are standardized, can be used as training data for a model.” The user of Redford_2021 would be motivated to include encoded images of agents across time to train a model in scenario classification of scenarios that occur across time. See Redford_2021 par 153: “ The sensor data 1302 could for example take the form of a video sequence or some other sequence of sensor data captured over some time interval. The sensor data 1302 thus captures a dynamic scene that might change over the duration of that time interval as the sensor-equipped vehicle moves or objects within the dynamic scene change or move.” When using data captured over a time interval, the user of Redford_2021 would be motivated to standardize this dynamic data when training the model. 

When training a model for scenario classification such as is done by Redford_2021 and Danna_2020, the user of Redford_2021 and Danna_2020 would be motivated to keep track of timestamps and increased features in increased timestamps based on sensor data, to standardize training data for the model, and to also allow for search queries based on time. 

Therefore, it would have been obvious to combine The classification of scenarios across time of Redford_2021 and Danna_2020 with the different feature sizes across timestamps of Zhang_2021 for the benefit of standardizing inputs to training and allowing for time-level search queries to obtain the invention as specified in the claims.	

Claim 12:The computer-implemented method according to claim 3, (see claim 3)
Redford_2021 does not expressly recite wherein the trajectory and/or speed data of the vehicle are captured by a GPS sensor, 
and wherein the trajectory, speed, and/or class ID data of the at least one object and the road information are captured by a camera sensor, LiDAR sensor, and/or radar sensor. 

Danna_2021 however, makes obvious and wherein the trajectory, speed, and/or class ID data of the at least one object and the road information are captured by a camera sensor, LiDAR sensor, and/or radar sensor. Par 44: “The encoded image may be a bird's-eye view (BEV) of the scenario and can be generated based on various sensor data, such as point clouds produced by LiDAR sensors of the vehicle. In this example, the encoded image can depict the environment in which the scenario occurred and one or more agents present within the environment. In some embodiments, the encoded image can further depict movement information (e.g., trajectories) of the one or more agents over a period of time”

As already stated in claim 3 Redford_2021 and Danna_2020  are analogous art to the claimed invention because they are from the same field of endeavor called scenario matching / classification for autonomous vehicles. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021 and Danna_2020. The rationale for doing so would have been to follow a teaching and motivation proposed in the art.  

Redford_2021 teaches training a model that determines similar scenarios, by augmenting scenario data and optimizing the weights with a loss function. Redford_2021 does this for par 160: “Scene extraction for the purpose of simulation and testing.“ Danna_2020 par 43 explains how using trajectories in determining similar scenarios is relevant for researchers who wish to simulate certain scenarios. “FIG. 1A illustrates an example scenario 100 in which a searcher faces various shortcomings of the conventional approaches. The example scenario 100 can be a scenario for which the searcher wishes to discover similar scenarios to include in a simulation suite of computer-based simulations to test a vehicle's response to those scenarios. Assume that the searcher is interested in simulation cases where a vehicle 110a cuts in front of another vehicle 108a, as illustrated in the example scenario 100. The example scenario 100 illustrates three vehicles 108a, 110a, and 112a navigating toward an intersection 102. The intersection 102 has at least one crosswalk 104 and at least one stop sign 106 to control oncoming traffic. The vehicles 108a, 110a, and 112a navigate toward the intersection 102 based on their respective trajectories 108b, 110b, 112b. In this example, the searcher may be interested in similar scenarios that involve vehicles performing a cut-in trajectory similar to the cut-in trajectory 110b of the vehicle 110a relative to the trajectory 108b of the vehicle 108a. In this regard, the searcher may retrieve scenarios in a sub-category of scenarios which include a four-way intersection with stop signs. However, the searcher may inadvertently fail to retrieve additional scenarios of interest that occur at a four-way intersection with traffic lights, because these scenarios are included in a different sub-category which the searcher overlooked. As a result, any computer-based simulations involving scenarios that occur at four-way intersections may be inaccurate or incomplete.” Danna_2020 accomplishes this result through the use of multiple allowed sensors, for example par 112: “any type of sensor data from, e.g., a Global Positioning System (GPS) module, inertial measurement unit (IMU), LiDAR sensors, optical cameras, radio frequency (RF) transceivers, or any other suitable telemetry or sensory mechanisms.” When choosing a type of sensor, it would be “obvious to try” from those known options to accomplish the predictable result of getting sensor data. 

Therefore, it would have been obvious to combine the data augmentation and workflow of Redford_2021  with the encoding of trajectory and map data, as well as the use of a LIDAR sensor of Danna_2020 for the benefit of allowing researchers to identify similar scenarios based on trajectory and path to obtain the invention as specified in the claims.	

Redford_2021 and Danna_2020 do not expressly recite wherein the trajectory and/or speed data of the vehicle are captured by a GPS sensor,

Zhang_2021 however makes obvious wherein the trajectory and/or speed data of the vehicle are captured by a GPS sensor,( par 132 :” The on-board GPS can provide a global position of the vehicle 71. The global position of the vehicle 71 may be expressed in terms of latitude and longitude measurements. By matching the global positions of the vehicles to a time stamp (e.g., T.sub.0−2, T.sub.0−1, and T.sub.0) or series of time stamps (i.e., a time series), the vehicle speed and acceleration may be calculated.”)

As already stated in claim 3 Redford_2021,  Danna_2020 and Zhang_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle scenario classification. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021,  Danna_2020 and Zhang_2021. The rationale for doing so would have been to follow a teaching or motivation proposed in the art. 

Danna_2020 uses the embeddings to map the “low level” and “high level” parameters to classify scenarios, see. par 45: “ In some instances, the non-image query can additionally specify other search parameters including parameters specifying temporal aspects and spatial aspects that include movements of an ego (e.g., an autonomous or semi-autonomous vehicle) or various agents. “ … Par 66: “The low-level parameters can comprise one or more classifications associated with the ego or various agents (Examiner note: the ego is the vehicle sensor) including, for example, agent type such as ego, pedestrian, cyclist, truck, or the like. In some instances, the low-level parameters can comprise metrics relating to temporal metrics (e.g., time or speed), spatial metrics (e.g., position or distance), or a combination of both (e.g., velocity or acceleration) associated with the ego or various agents.” … par 66: “ The first vehicle 758a may be an ego and the ego can be collecting data relating to the example scenario 750. Some of the low-level parameters that the first vehicle 758a (e.g., the ego) collects can be “ego distance to an agent”, “ego hard braking” based on deceleration, or the like.” As outlined, Danna_2020 classifies scenarios partially on the egos deceleration (Examiner note ie: speed data). When classifying the scenarios using vectors, the user of Danna_2020 would be motivated to embed the speed data of the vehicle to allow for accurate classification of that information when performing a high level query search. In order to gather this data, the user of Redford_2021 and Danna_2020 would use some sensor, such as GPS. Danna_2020 par 110 states “For example, the vehicle 1140 may have wheel sensors for, e.g., measuring velocity; global positioning system (GPS) for, e.g., determining the vehicle's current geolocation; and inertial measurement units, accelerometers, gyroscopes, and odometer systems for movement or motion detection.” Based on this information, it would be obvious to try for one ordinarily skilled in the art to use GPS data in order to fulfill the goal of capturing the trajectory/speed of the vehicle. 

Therefore, it would have been obvious to combine the augmented training workflow of Redford_2021 and Danna_2020 which uses encoders to process vehicle sensor information with Zhang_2021 model of having multiple encoders, with specifically an encoder to extract features relating to speed of the ego vehicle through the use of a GPS sensor to allow for classification based on the ego vehicles speed to obtain the invention as specified in the claims.	

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Redford_2021, Danna_2020, and US 20230038673 A1 “SEQUENTIAL PEDESTRIAN TRAJECTORY PREDICTION USING STEP ATTENTION FOR COLLISION AVOIDANCE” (Masoud_2021)
Claim 7:The computer-implemented method according to claim 1, wherein the first to sixth encoders (see claim 1)
Redford_2021 and Danna_2020 do not expressly recite have LSTM layers.
Masoud_2021 however makes obvious have LSTM layers. (par 85: “ The history trajectory length m depends on frequency, where 2m is the number of data points. First, the history trajectory is encoded and learned by the first LSTM 308 (referred to as the first LSTM layer) having an output with shape [m, 128], and then it is passed to a second LSTM (or second LSTM layer) 310 having an output with shape [m, 256]. The two LSTM layers 308, 310 extract time-related patterns directly from the input trajectory location sequence 320 or the updated trajectory location sequence 323. The LSTM-RNN kernel 302 processes the raw data of the history trajectory location sequence 320 or the updated trajectory location sequence 323 to provide a new high-dimensional time-ordered sequence. The high-dimensional time-ordered sequence includes time-dependent hidden features (or states) of high-dimensional patterns of coordinates with long and short term dependencies. The output of the second LSTM 310 is m time steps of resultant hidden features.”)

Redford_2021, Danna_2020 and Masoud_2021 are analogous art to the claimed invention because they are from the same field of endeavor called vehicle sensor classification and prediction. Where Redford_2021 and Danna_2020 focus on detecting agents, Masoud_2021 focuses specifically on pedestrians and predicting their movement. Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021, Danna_2020 and Masoud_2021.
The rationale for doing so would have been to follow a teaching proposed in the art. Masoud_2021 states “Within the LSTM-RNN kernel 302, based on demonstrated advantages of LSTMs in learning sequential sensor data, two LSTMs 308, 310 are used to learn time-related patterns within the raw input and/or data received from the sequential prediction module 307.” When utilizing the invention of Redford_2021 and Danna_2020 to classify similar scenarios through sensor data Danna_2020 par 58 “over some period of time,” the inventor of Redford_2021 and Danna_2020 would be motivated to use LSTM layers for encoding as it has known advantages in learning sequential sensor data. 

Therefore, it would have been obvious to combine the training and classification workflow of Redford_2021 and Danna_2020 with the use of LSTM layers of Masoud_2021 for the benefit of learning sequential sensor data to obtain the invention as specified in the claims.	

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Redford_2021, Danna_2020, Zhang_2021 and “Introducing Ragged Tensors” by Laurence Moroney (Moroney_2018)

Claim 9:
The computer-implemented method according to claim 8, wherein the first machine learning algorithm and the second machine learning algorithm (see claim 8)

Danna_2020 makes obvious Par 9: “In an embodiment, the image data can be a raster of the at least one example scenario that includes at least one trajectory associated with the one or more vehicles, one or more respective trajectories associated with one or more agents, and map data.” (Examiner note: see also claim 3 which introduces this limitation, where this is the trajectory data which is processed).

Redford_2021, Danna_2020, and Zhang_2021 do not expressly recite use ragged tensors

Moroney_2018 however makes obvious use ragged tensors (Using ragged tensors: “
The following example shows ragged tensors being used to construct and combine embeddings of single words (unigrams) and word pairs (bigrams) for a variable-length list of words making up a phrase. “

Redford_2021, Danna_2020, Zhang_2021, and Moroney_2018 are analogous art to the claimed invention because they are from the same field of endeavor called machine learning and data processing.  Redford_2021, Danna_2020, and Zhang_2021 focus specifically on vehicle sensor data, while Moroney_2018 is an article that delves into ragged tensors which could be applied to a variety of machine learning applications for different data types. 
Before the effective filing date, it would have been obvious to a person of ordinary skill in the art to combine Redford_2021, Danna_2020, Zhang_2021, and Moroney_2018.
The rationale for doing so would have been to follow a motivation proposed in the art. 

Danna_2020 par 68 states “Where scenarios are represented as videos or sets of successive images, low-level parameters and associated values of the low-level parameters can be maintained for each portion of a video or a set of successive images” Moroney_2018 page 1 par 1 states “In many scenarios, data doesn’t come evenly divided into uniformly-shaped arrays that can be loaded into tensors. A classic case is in training and processing text. For example, if you look at the Text Classification tutorial that uses the IMDB dataset, you’ll see a major part of your data preparation is in shaping your data to a normalized size. In that case, every review needs to be 256 words long. If it is longer, it is truncated, and if it is shorter, it is padded with 0 values until it reaches the desired length.
Ragged tensors are designed to ease this problem. They are the TensorFlow equivalent of nested variable-length lists. They make it easy to store and process data with non-uniform shapes, such as:
Feature columns for variable-length features, such as the set of actors in a movie.
Batches of variable-length sequential inputs, such as sentences or video clips.
Hierarchical inputs, such as text documents that are subdivided into sections, paragraphs, sentences, and words.
Individual fields in structured inputs, such as protocol buffers.”
Therefore, it would have been obvious to combine the workflow and scenario classification of Redford_2021, Danna_2020, and Zhang_2021 with the use of ragged tensors as taught by Moroney_2018 for the benefit of easily processing video and sensor data of non-uniform shapes to obtain the invention as specified in the claims.	

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHMAD HUSSAM SHALABY whose telephone number is (571)272-7414. The examiner can normally be reached Mon-Fri 7:30am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached at 5712723652. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/A.H.S./Examiner, Art Unit 2187                                                                                                                                                                                                        

/BRIAN S COOK/Primary Examiner, Art Unit 2187
Read full office action
METHOD FOR DETERMINING SIMILAR SCENARIOS, TRAINING METHOD, AND TRAINING CONTROLLER

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD FOR DETERMINING SIMILAR SCENARIOS, TRAINING METHOD, AND TRAINING CONTROLLER

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email