Office Action Analysis: 18622658 — DATASET GENERATION FROM SCENARIOS CLUSTERED BY SCENE SIMILARITY

Office Action

§103
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
2.	The United States Patent & Trademark Office appreciates the application that is by the
inventor/assignee. The United States Patent & Trademark Office reviewed the following
application and has made the following comments below.

Information Disclosure Statement
3.	The information disclosure statement (IDS) submitted on 3/29/2024. The submission is in
compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure
statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
4.	In the event the determination of the status of the application as subject to AIA  35 U.S.C.
102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the
statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection,
would be the same under either status.
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness
rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35
U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or
nonobviousness.

5.	Claims 1-5, 6-11, 13-18, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yang et al. (US Patent Pub. No. US 12116015 B2, hereafter referred to as Yang) in view of Pronovost (US Patent Pub. No. US 12012108 B1, hereafter referred to as Pronovost).

6.	Regarding Claim 1, Yang teaches a system, comprising: one or more processors (Fig. 9, col 5 lines 15-23, col 32 lines 4-17, Yang teaches a computing system with at least one processor and a machine learning computing system with at least one processor, where the two systems are connected using a combination of networks.), and non-transitory memory storing processor-executable instructions that, when executed by the one or more processors (col 11 line 64-col 12 line 3, col 31 lines 1-6, Yang teaches a non-transitory computer readable storage media to store instructions that when executed by the one or more processors cause the vehicle to perform operations and functions.), cause the system to perform operations comprising: receiving first scene data corresponding to a first scene (col 2 lines 19-21, col 8 lines 40-48, col 11 lines 16-27, col 24 lines 52-59, Yang teaches an example operational scenario that sends and receives data or signal in or from the vehicle, which includes a first object, where an object is defined as anything that depicts dynamic scene information in the environment or motion path such as a vehicle.), the first scene data including an environment representation (col 8 lines 54-67, Yang teaches the generation and storage of initial data  to include data descriptive of the environment.), a first pose of an autonomous vehicle at a first time, and an indication of an object (col 3 lines 34-42, col 19 lines 1-2, Yang teaches the generation of an initial object trajectory, which is an example of scene data, for an object, using sensor data, where the trajectory includes a plurality of initial object observations associated with object size, initial object pose, and a timestamp.); determining, based at least in part on the first pose (Fig. 6, col 1 line 64-col 2 line 2, col 3 lines 62-67, Yang teaches the generation of an initial object trajectory for an object, including an initial object pose of the object, all based on raw sensor data.), a second pose of the object at the first time (Fig. 6, col 2 lines 12-21, Yang teaches the generation of a refined object trajectory and updated object pose. The Examiner interprets the updated object pose as a second pose of the object.), relative to the first pose (col 3 lines 42-53, Yang teaches generating an updated initial object trajectory that includes a plurality of updated initial object observations including an updated object pose of the object for the plurality of refined object observations. The Examiner interprets this updated initial object trajectories as a subsequent and second trajectory.);
determining, based at least in part on the first pose and the second pose, a representation of the first scene (col 2 lines 16-21, col 20 lines 8-12, and col 22 lines 4-7, Yang teaches an annotation system that generates a multi-dimensional label for the objects based on the initial object trajectory as well as the refined object trajectory, where the system determines a motion path for the object based on the multi-dimensionality of the label. The Examiner interprets at least in part as on the first and/or the second pose.).

    PNG
    media_image1.png
    720
    1248
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    748
    500
    media_image2.png
    Greyscale

	Yang does not teach determining a difficulty level associated with the first scene data based at least in part on an error of prediction, by a machine-learned model, of a third pose of the object at a second time different from the first time, associating the first scene data with a first cluster from among a set of clusters, the first cluster indicating a set of scene data, generating a dataset based at least in part on sampling scene data from the set of clusters, wherein the dataset includes the first scene and at least a subset of the set of scene data indicated by the first cluster, wherein generating the dataset comprises selecting, based on the difficulty level being greater than a threshold, at least the first scene to be included in the dataset, and training a machine-learned model based at least in part on the dataset.
	Pronovost is in the same field of art of training autonomous vehicles with environment image data. Further Pronovost teaches determining a difficulty level associated with the first scene data based at least in part on an error of prediction (col 9 lines 9-22, col 12 lines 48-65, col 23 lines 34-45, Pronovost teaches the use of machine learning models with map data, an example of scene data, for four features that would had been added to the map data in the form of perception components such as size, position, pose, orientation, velocity, and acceleration, in the determination of confidence levels, which are a threshold indicator of difficulty levels, probabilities. and likelihood that a dynamic object may have each of the alternative predicted states at a future time.), by a machine-learned model, of a third pose of the object at a second time different from the first time (Fig. 2A-2D, col 9 lines 9-22, col 12 lines 48-65, col 23 lines 34-45, Pronovost teaches the use of machine learning models with map data, an example of scene data, for four features that would had been added to the map data in the form of perception components such as size, position, pose, orientation, velocity, and acceleration, in the determination of confidence levels, which are a threshold indicator of difficulty levels, probabilities. and likelihood that a dynamic object may have each of the alternative predicted states at a future time. The Examiner interprets four features being added to the map data in the form as pose that would be used to train the machine learning model as the presence of a third pose of the object.), associating the first scene data with a first cluster from among a set of clusters, the first cluster indicating a set of scene data (Fig. 3, col 2 lines 21-28, col 4 lines 12-21, and col 4 lines 48-59, Pronovost teaches a set of heuristics data, which is a type of data clustering associating specific combinations of object data for each visual feature, including the first one, such as object types and object features with corresponding modifications to the map data features as well as defining set of associations between specific object types/features.), generating a dataset based at least in part on sampling scene data from the set of clusters (col 5 line 56 – col 6 line 9, and col 8 lines 27-34, Pronovost teaches example sets of data such as association tables, sets of mapping rules to determine modifications to the map data such as adding/removing map features from the map data as well as simulated data.), wherein the dataset includes the first scene and at least a subset of the set of scene data indicated by the first cluster (col 8 lines 27-34, Pronovost teaches that the generation of datasets is based on the object features detected by the vehicle, which includes the first through fourth object features, where are object features are examples of scenes.), wherein generating the dataset comprises selecting, based on the difficulty level being greater than a threshold, at least the first scene to be included in the dataset (col 7 lines 36-47, col 12 lines 58-col 13 line 6, col 23 lines 31-41, Pronovost teaches independence and/or conjunctive analysis of the objects and object features, which are scenes, based on thresholds such as confidence levels.), and training a machine-learned model based at least in part on the dataset (Fig. 2A-2D, col 9 lines 9-22, col 12 lines 48-65, col 23 lines 34-45, Pronovost teaches the use of machine learning models with map data, an example of scene data, for features that would had been added to the map data in the form of perception components such as size, position, pose, orientation, velocity, and acceleration, in the determination of confidence levels, which are a threshold indicator of difficulty levels, probabilities. and likelihood that a dynamic object may have each of the alternative predicted states at a future time.).

    PNG
    media_image3.png
    447
    689
    media_image3.png
    Greyscale
 
    PNG
    media_image4.png
    441
    699
    media_image4.png
    Greyscale
 
    PNG
    media_image5.png
    437
    696
    media_image5.png
    Greyscale
 
    PNG
    media_image6.png
    463
    691
    media_image6.png
    Greyscale


    PNG
    media_image7.png
    786
    1334
    media_image7.png
    Greyscale

Therefore it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yang by incorporating the data clustering system used to train ML models using perception components that is taught by Pronovost to make an invention that can automatically group similar environment images and identify diverse example scenarios based on their features; thus one of ordinary skill in the art would be motivated to combine the references since the need is to ensure safety for passengers as well as surrounding persons and objects, while traversing through congested areas with other moving vehicles (autonomous or otherwise), moving people, and stationary buildings (col 1 lines 6-26, Pronovost).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

7.	In regards to Claim 2, Yang in view of Pronovost teaches wherein the first scene data further includes a fourth pose of the autonomous vehicle, relative to the first pose, at a third time before the first time (Fig. 2A-2D, col 9 lines 9-22, col 12 lines 48-65, col 23 lines 34-45, Pronovost teaches the use of machine learning models with map data, an example of scene data, for four features that would had been added to the map data in the form of perception components such as size, position, pose, orientation, velocity, and acceleration, in the determination of confidence levels, which are a threshold indicator of difficulty levels, probabilities. and likelihood that a dynamic object may have each of the alternative predicted states at a future time.), and the operations further comprising:
determining, based at least in part on the first pose, a fifth pose of the object at the third time, relative to the first pose (Fig. 5, col 9 lines 9-22, Pronovost teaches multiple features, from four features and so on, that would have been added to the map data,).

    PNG
    media_image8.png
    823
    1088
    media_image8.png
    Greyscale


8.	In regards to claim 3, Yang in view of Pronovost teaches wherein the representation of the first scene comprises: a first feature vector identifying the first pose as an origin of a reference frame (col 16 lines 3-7, Pronovost teaches that object data may include object data and feature data over a number of previous timesteps, which includes an initial original reference timestep, where the object data includes object types, location, poses, velocity vectors, and environment -related data perceived by the vehicle.), a second feature vector identifying a position and orientation corresponding to the second pose based on the reference frame (col 16 lines 3-7, Pronovost teaches that object data may include object data and feature data over a number of previous timesteps, which includes an initial original reference timestep, where the object data includes object types, location, poses, velocity vectors, and environment -related data perceived by the vehicle.).

9.	In regards to claim 4, Yang in view of Pronovost teaches wherein the first pose is associated with a geographic location and the first feature vector further identifies a classification associated with the geographic location (col 13 lines 43-64 and col 14 lines 1-6, Yang teaches map data, which includes the first pose associated with the first feature vector, providing detailed information about the surrounding environment of the vehicle or the geographic area in which the vehicle was or will be located.), the classification comprising one of: driving lane, turning lane, road junction, parking spot, traffic light intersection, or shoulder lane (col 13 lines 43-64, col 14 lines 1-6, Yang teaches map data of surrounding environment of the vehicle or geographic area in which the vehicle was/will be in to include the identity and location of different roadways, road segments, buildings, or other items or objects such as lampposts, crosswalks or curbs, the location and directions of traffic lanes such as the location and direction of a parking lane, a turning lane, a bicycle lane, or other lanes within a particular roadway or other travel way or one or more boundary markings associated therewith, traffic control data such as the location and instructions of signage, traffic lights, or other traffic control devices, obstruction information, event data, nominal vehicle path data, or any other map data that provides information that assists the vehicle computing system.).

10.	In regards to Claim 5, Yang in view of Pronovost teaches wherein the first scene data is associated with the first cluster based at least in part on determining a similarity between the representation and an attribute associated with the first cluster (Fig. 3, col 2 lines 21-28, col 4 lines 12-21, and col 4 lines 48-59, Pronovost teaches a set of heuristics data, which is a type of data clustering associating specific combinations of object data for each visual feature, including the first one, such as object types and object features with corresponding modifications to the map data features as well as defining set of associations between specific object types/features.), the attribute being a representation of mean scene data of the first cluster (paragraphs 14, 25, 65-67, 107, Pronovost teaches attributes of the detected objects and predicted trajectories for objects being represented in machine learning techniques used in conjunction with sampling techniques such as Gaussian sampling and a most likely sampling technique, as well as clustering algorithms such as k-means.).

11.	Regarding Claim 6, Yang teaches a method comprising: receiving a plurality of scene data (Fig. 2A and col 9 lines 29-45, Yang teaches an autonomy computing system that obtains perception data, prediction data, and motion plan data to comprehend the vehicle’s surrounding environment.), each indicating an environment traversed by an autonomous vehicle (Fig. 8 and col 4 lines 34-36, Yang teaches extraction of spatial-temporal features from four-dimensional point clouds that are generated from the sensor data of object trajectories.), and receiving an instruction to generate a dataset, the instruction including a target size of the dataset, the target size being less than a number of scene data in the plurality of scene data (col 5 lines 17-30, Yang teaches instructions received to perform an operation of obtaining sensor data from sensors and generating initial object trajectories for an object using the sensor data, where the data is set as sequential over time.).

    PNG
    media_image9.png
    736
    1084
    media_image9.png
    Greyscale
 
    PNG
    media_image10.png
    814
    622
    media_image10.png
    Greyscale

	Yang does not teach determining a plurality of feature vectors, each representing a scene data of the plurality of scene data, clustering, based on a distance metric in a space of the feature vectors, the plurality of feature vectors into one or more clusters, each cluster representing a subset of the plurality of scene data, generating, by sub-sampling the one or more clusters, the dataset of the target size, and wherein the sub-sampling selects at least one scene data from the one or more clusters.
	Pronovost is in the same field of art of training autonomous vehicles with environment image data. Further Pronovost teaches determining a plurality of feature vectors, each representing a scene data of the plurality of scene data (col 8 lines 27-34 and col 16 lines 3-7, Pronovost teaches that the generation of datasets is based on the object features detected by the vehicle, which includes the first through fourth object features, where the data includes any object related data such as velocity vectors.), clustering, based on a distance metric in a space of the feature vectors, the plurality of feature vectors into one or more clusters (Fig. 3, col 2 lines 21-28, col 4 lines 12-21, and col 4 lines 48-59, Pronovost teaches a set of heuristics data, which is a type of data clustering associating specific combinations of object data for each visual feature, such as object types and object features like velocity vectors with corresponding modifications to the map data features as well as defining set of associations between specific object types/features.), each cluster representing a subset of the plurality of scene data (Fig. 3, col 2 lines 21-28, col 4 lines 12-21, and col 4 lines 48-59, Pronovost teaches a set of heuristics data, which is a type of data clustering associating specific combinations of object data for each visual feature, such as object types and object features like velocity vectors with corresponding modifications to the map data features as well as defining set of associations between specific object types/features.), generating, by sub-sampling the one or more clusters, the dataset of the target size (col 5 line 56 – col 6 line 9, and col 8 lines 27-34, Pronovost teaches example sets of data such as association tables, sets of mapping rules to determine modifications to the map data such as adding/removing map features from the map data as well as simulated data.), and wherein the sub-sampling selects at least one scene data from the one or more clusters (col 8 lines 27-34, Pronovost teaches that the generation of datasets is based on the object features detected by the vehicle, which includes the first through fourth object features, where are object features are examples of scenes.).
Therefore it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yang by incorporating the data clustering method used to train ML models using perception components that is taught by Pronovost to make an invention that can automatically group similar environment images and identify diverse example scenarios based on their features; thus one of ordinary skill in the art would be motivated to combine the references since the need is to ensure safety for passengers as well as surrounding persons and objects, while traversing through congested areas with other moving vehicles (autonomous or otherwise), moving people, and stationary buildings (col 1 lines 6-26, Pronovost).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

12.	In regards to Claim 7, Yang in view of Pronovost teaches wherein determining the plurality of feature vectors comprises: inputting respective scene data to a trained, machine-learned (ML) model configured to output a prediction related to the respective scene data (paragraph 28-29, 36, Pronovost teaches trained object feature ML model receiving image data from the sensor system that obtains scene context map data to be trained to output map data features, where the input data includes object types, features, and locations.); receiving, from the ML model, an embedding representing the respective scene data (paragraph 71, Pronovost teaches the ML model being a GNN that encodes features into a node and an edge of a GNN used to determine the first predicted position of an object sampled from the data, identifying a relationship with image data and a real-world scenario.), wherein a feature vector corresponding to the respective scene data comprises the embedding (paragraphs 61, 71, Pronovost teaches object data that is used for the ML model configuration that comprises the embedding to include any object-related data such as object types, locations, poses, velocity vectors, and environment-related data perceived by the vehicle.).

13.	In regards to Claim 8, Yang in view of Pronovost teaches wherein: the ML model is a transformer-based ML model and the embedding is generated by an encoder component, or the ML model is a graph neural network (GNN) and the embedding corresponds to a node embedding of the GNN (paragraph 13, 42, 59, 61, 71, Pronovost teaches the ML model being a GNN that encodes features into a node and an edge of a GNN to determine the first predicted position of an object sampled from the data output by the GNN outside a specified area and to determine a second predicted position of the object based on map data.).

14.	In regards to Claim 9, Yang in view of Pronovost teaches wherein determining a feature vector corresponding to a respective scene data comprises: determining, based on the respective scene data, a trajectory of the autonomous vehicle (paragraphs 61-63, Pronovost teaches a velocity vector as a part of object data that is captured by the vehicle to represent data associated with an environment of the vehicle for the model prediction in autonomous vehicles.); determining, based on the trajectory, a plurality of spatial bins within an area of the environment covered by the respective scene data (paragraph 83, Pronovost teaches spatial information such as image data projected onto a mesh, which is known as spatial bins, obtained using one or more map that can be used by the vehicle to navigate within the environment.), wherein the feature vector comprises an aggregation of scene labels of the respective scene data within each spatial bin of the plurality of spatial bins (paragraphs 62-63 and 84-85, Pronovost teaches spatial information such as image data projected onto a mesh used in connection with other scene data that had been marked as the localization component, perception component, prediction component, and/or planning component to determine a location of the vehicle, identify objects in an environment, and/or generate routes and/or trajectories to navigate within an environment.).

15.	In regards to Claim 10, Yang in view of Pronovost teaches wherein determining a feature vector corresponding to a respective scene data comprises: determining, based on the respective scene data, a first feature vector including scene labels associated with the respective scene data (paragraphs 14, 27, 47, 62-63, Pronovost teaches feature vectors used for planning to detect, determine, and classify various routes, trajectories, and driving maneuvers at various levels of detail, which would include labeling image data pixels.); and determining, based on the respective scene data, a second feature vector, different from the first feature vector, the second feature vector being represented in a high-dimensional vector space (Fig. 5, paragraphs 62 and 83, Pronovost teaches map data, map features, including feature vectors being represented and captured in the form of two dimensions, three dimensions, or N-dimensions that are capable of providing information about an environment, while having any number of data structures.), wherein the feature vector comprises a combination of the first feature vector and the second feature vector (paragraphs 62-63, 94, Pronovost teaches combining various components or having various components performed in any other component, where various component is referring to object feature models, object/map feature heuristics, and prediction models).

16.	In regards to Claim 11, Yang in view of Pronovost teaches wherein determining a feature vector corresponding to a respective scene data comprises: determining, based on a reference pose of the autonomous vehicle at a first time, a reference frame for the respective scene data (col 3 lines 34-42, col 19 lines 1-2, Yang teaches the generation of an initial object trajectory, which is an example of scene data, for an object, using sensor data, where the trajectory includes a plurality of initial object observations.); determining, based on the reference frame, a first pose corresponding to an object represented in the scene data at the first time (col 3 lines 34-42, col 19 lines 1-2, Yang teaches the generation of an initial object trajectory, which is an example of scene data, for an object, using sensor data, where the trajectory includes a plurality of initial object observations associated with object size, initial object pose, and a timestamp.); determining, based on the reference frame, a second pose corresponding to the object at a second time, before the first time (Fig. 6, col 2 lines 12-21, Yang teaches the generation of a refined object trajectory and updated object pose. The Examiner interprets the updated object pose as a second pose of the object.); determining, based on the reference frame, a third pose corresponding to the object at a third time, after the first time (Fig. 2A-2D, col 9 lines 9-22, col 12 lines 48-65, col 23 lines 34-45, Pronovost teaches the use of machine learning models with map data, an example of scene data, for four features that would had been added to the map data in the form of perception components such as size, position, pose, orientation, velocity, and acceleration, in the determination of confidence levels, which are a threshold indicator of difficulty levels, probabilities. and likelihood that a dynamic object may have each of the alternative predicted states at a future time.), wherein the feature vector comprises an indication of at least the reference pose, the first pose, the second pose, and the third pose (Fig. 2A-2D, col 9 lines 9-22, col 12 lines 48-65, col 23 lines 34-45, Pronovost teaches the use of machine learning models with map data, an example of scene data, for four features that would had been added to the map data in the form of perception components such as size, position, pose, orientation, velocity, and acceleration, in the determination of confidence levels, which are a threshold indicator of difficulty levels, probabilities. and likelihood that a dynamic object may have each of the alternative predicted states at a future time.).

17.	In regards to Claim 13, Yang in view of Pronovost teaches wherein the sub-sampling selects the scene data from the cluster based on a difficulty level of the scene data (paragraphs 53 and 107, Pronovost teaches the examples of subsampling, clustering algorithms and hierarchical clustering, as well as combinations of an object type and a single object feature being associated with a single map feature, where any type of complex heuristics such as visual heuristics may be used, this includes complex or reduced data complexity of the scene map data.).

18.	In regards to Claim 14, Yang in view of Pronovost teaches wherein determining the difficulty level of the scene data comprises: receiving, as output from a machine-learned prediction model, a predicted pose of an object represented in the scene data (paragraph 13-14, 21, 43-45, Pronovost teaches receiving output data from ML prediction models indicating predicted future locations, trajectories, poses and/or other predicted states for the other dynamic objects int the environment.); determining, based on the scene data, an error between the predicted pose and an actual pose of the object (paragraphs 90-91, Pronovost teaches training data inputted into machine learning models where a known result can be used to adjust weight and/or parameters of the machine learning model to minimize errors, which would require an error value between the prediction and actual parameter such as pose.), wherein the difficulty level is based at least in part on the error (paragraphs 53, 90-91, Pronovost teaches combinations of object types and object features associated with map features, where any type of more complex heuristics may be used, this includes any number of corresponding map data modifications. The heuristics data may be including heuristics that associate any other environment-related data with specific map features and/or map data modifications. This data is used to obtain an error value to minimize errors.).

19.	In regards to Claim 15, Yang in view of Pronovost teaches wherein the difficulty level is based on a complexity of scene, the complexity indicative of one or more of: a number of objects in the scene data, a density of objects in the scene data, a distance between the autonomous vehicle and an object in the scene, a map feature in the scene data, or a speed of the autonomous vehicle or an object in the scene (paragraph 53, Pronovost teaches combinations of object types and object features associated with map features, where any type of more complex heuristics may be used, this includes any number of corresponding map data modifications. The heuristics data may be including heuristics that associate any other environment-related data with specific map features and/or map data modifications. The Examiner interprets complexity with an increased number of heuristics used as difficulty level being based on a complexity of scene.).

20.	Regarding Claim 16, Yang teaches one or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, perform operations (col 11 line 64-col 12 line 3, col 31 lines 1-6, Yang teaches a non-transitory computer readable storage media to store instructions that when executed by the one or more processors cause the vehicle to perform operations and functions.), determining a plurality of scene features representing the scene data (Fig. 8 and col 4 lines 34-36, Yang teaches extraction of spatial-temporal features from four-dimensional point clouds that are generated from the sensor data of object trajectories.), and receiving an instruction to generate a dataset, the instruction indicating criteria of a target dataset (col 5 lines 17-30, Yang teaches instructions received to perform an operation of obtaining sensor data from sensors and generating initial object trajectories for an object using the sensor data, where the data is set as sequential over time).
	Yang does not teach receiving scene data comprising a plurality of top-down representations of an environment, determining a plurality of scene features representing the scene data, clustering, based on a similarity between the scene features, the scene data into one or more clusters, each cluster representing a subset of the plurality of top-down representations, sampling, based on the criteria of the target dataset, the one or more clusters, and generating the dataset satisfying the criteria, the dataset comprising a subset of the scene data.
	Pronovost is in the same field of art of training autonomous vehicles with environment image data. Further Pronovost teaches receiving scene data comprising a plurality of top-down representations of an environment (Fig.7 and col 4 lines 19-24, Pronovost teaches generating an object observation of an object based on an initial object trajectory, where the data extracted is high resolution features of the object in a bird’s eye view, BEV, space, which is top-down representation.), determining a plurality of scene features representing the scene data, clustering, based on a similarity between the scene features, the scene data into one or more clusters, each cluster representing a subset of the plurality of top-down representations (Fig 1, col 11 lines 5-19, Pronovost teaches box 126, which depicts a top-down view of the same environment shown in driving scene 108 to examine characteristics of map data such as similarities to determine necessary modifications such as adding or removing map features, based on heuristic data groupings.), sampling, based on the criteria of the target dataset, the one or more clusters (col 5 line 56 – col 6 line 9, and col 8 lines 27-34, Pronovost teaches example sets of data such as association tables, sets of mapping rules to determine modifications to the map data such as adding/removing map features from the map data as well as simulated data.), and generating the dataset satisfying the criteria, the dataset comprising a subset of the scene data (col 8 lines 27-34, Pronovost teaches that the generation of datasets is based on the object features detected by the vehicle, which includes the first through fourth object features, where are object features are examples of scenes.).

    PNG
    media_image11.png
    720
    822
    media_image11.png
    Greyscale
 
    PNG
    media_image12.png
    800
    543
    media_image12.png
    Greyscale

Therefore it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Yang by incorporating data clustering used to train ML models using perception components that is taught by Pronovost to make an invention that can automatically group similar environment images and identify diverse example scenarios based on their features; thus one of ordinary skill in the art would be motivated to combine the references since the need is to ensure safety for passengers as well as surrounding persons and objects, while traversing through congested areas with other moving vehicles (autonomous or otherwise), moving people, and stationary buildings (col 1 lines 6-26, Pronovost).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

21.	In regards to Claim 17, Yang in view of Pronovost teaches wherein each top-down representation associated with a respective time instant (Fig. 5, paragraphs 38-40, 55-56, Pronovost teaches the top-down representations being used for ML modeling training at different timesteps while the vehicle is in motion, for instance, when approaching pedestrians and a stop sign.), and the scene features include a representation of objects in the environment over a period of time (Fig. 5, paragraphs 38-40, 55-56, Pronovost teaches the top-down representations being used for ML modeling training at different timesteps while the vehicle is in motion, for instance, when approaching pedestrians and a stop sign.).

22.	In regards to Claim 18, Yang in view of Pronovost teaches wherein the scene features comprise an embedding of respective top-down representations generated by a trained machine-learned model configured to output predicted states based on an input top-down representation (Fig. 5, paragraphs 5, 45, 60, 71, Pronovost teaches encoding features in a driving environment such as a new stop sign map feature based on a pedestrian, into a multi-channel representation and providing the multi-channel representation to an ML prediction model to predict the future states of dynamic objects in the environment, which is an example of encoding raw data into structured machine-readable format representations for embedding).

23.	In regards to Claim 19, Yang in view of Pronovost teaches wherein the sampling: comprises selecting a subset of top-down representations from each cluster of the one or more clusters (paragraph 120, Pronovost teaches receiving sensor data associated with an environment to determine based at least in part on a first subset data, a first object at a first physical location in the environment and a first object type associated with the first object, where this is also done with a subsequent second subset.), and is based on a difficulty level of respective top-down representations (paragraphs 120-121, Pronovost teaches the method of determination based on the first and second subsets being based on existing map data, which includes the initial map data of top-down representations.).

24.	In regards to Claim 20, Yang in view of Pronovost teaches wherein the difficulty level of a top-down representation is based on a prediction error generated by a trained ML model when provided, as input, the top-down representation (Fig.7 and col 4 lines 19-24, paragraphs 90-91, Pronovost teaches generating an object observation of an object based on an initial object trajectory, where the data extracted is high resolution features of the object in a bird’s eye view, BEV, space, which is top-down representation. The BEV space image is used for the ML model training as a data input into machine learning models where a known result can be used to adjust weight and/or parameters of the machine learning model to minimize errors, which would require an error value between the prediction and actual parameter such as pose.).

Conclusion
25.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to LOUIS NWUHA whose telephone number is (571)272 -0219. The examiner can normally be reached Monday to Friday 8 am to 5 pm.

26.	Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

27.	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Oneal Mistry can be reached at 3134464912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

28.	Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LOUIS NWUHA/Examiner, Art Unit 2674                                                                                                                                                                                                        

/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674
Read full office action
DATASET GENERATION FROM SCENARIOS CLUSTERED BY SCENE SIMILARITY

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

DATASET GENERATION FROM SCENARIOS CLUSTERED BY SCENE SIMILARITY

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email