Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 7/16/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 18 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 18 recites the limitation "The system of claim 11, …" in line 1, however claim 11 is directed towards a method. There is insufficient antecedent basis for this limitation in the claim. It appears that claim 18 is incorrectly dependent on claim 11 and should instead be dependent on claim 13 and will be interpreted as such by the examiner for examination purposes.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Donderici (U.S. Patent Application Pub. No. 2024/0086709) in view of Shelhamer, et al., hereinafter Shelhamer (U.S. Patent Application Pub. No. 2022/0309285).
Regarding Claim 1, Donderici teaches: A computer-implemented method for planning motion of a Self-Driving Car (SDC) (Donderici, Para. 0015, 0030 – “processes”, or “methods”, “for learning compute paths that an AV can implement/execute to perform an operation(s)”; where an AV is an “autonomous vehicle”), the SDC having a plurality of sensors configured to generate sensed data representative of surroundings of the SDC (Donderici, Para. 0015, 0047-0048 – “autonomous vehicles (AVs) can include various sensors” to “enable the AV 102 to “see”…, “hear”…, and “feel”… its environment” by collecting metrics, or data; for example cameras, LIDAR sensors, microphones, etc.), the SDC being communicatively coupled to a processor (Donderici, Para. 0047 – “one or more processors”) configured to execute a plurality of Machine-Learning (ML) models for detecting objects in the surroundings of the SDC (Donderici, Para. 0031, 0047, 0057-0061, 0066 – wherein the “one or more processors” of the autonomous vehicle (AV) are in communication with an “AI/ML platform” of a “data center”, where the “AI/ML platform” provides “machine learning models” for “object detection”; furthermore, the AV may implement the models, or “neural network[s]” by “a computer of AV” to “provide a classification and/or localization of one or more objects in an input image”), the plurality of ML models being sequentially connected therebetween, such that an output of a first ML model of the plurality of ML models is used as an input to a second ML model, sequentially following after the first ML model in the plurality of ML models (Donderici, Fig. 5-6, Para. 0030-0037, 0069-0073, 0078, 0118-0123 – a machine learning model may be “a neural network(s)” comprising several neural network models “within a larger neural network model”, for example, Fig. 5 illustrates neural network models 502, 510, and 520, wherein models 510 and 520 are placed sequentially after model 502, representing two compute paths, and receive the “output from [the] neural network model 502” as input; and further Fig. 6 illustrates a “neural network” comprising of two paths 620 and 622, wherein alternate path 622 represents an “early exit” following a first portion, or sub-model, of the neural network model),
each one of the plurality of ML models having been trained to detect the objects in the surroundings of the SDC based on the sensed data representative thereof (Donderici, Para. 0059-0060, 0064-0068 – wherein the provided “machine learning models”, comprising “neural network(s)” are trained, evaluated, and refined for “object detection”; for example, the neural networks receive input “sensor data” capturing “a view, scene, environment, shape, and/or object” and output “classification and/or localization of one or more objects in an input image”);
a given sequential ML model being configured to detect the objects with a respective value of an object detection precision metric (Donderici, Para. 0080-0081 – “the neural network 210 can generate a mean score (or z-score) of each feature” and “a class of an object or a probability of classes that best describes the objects in the image”, indicating accuracy), the respective value of the object detection precision metric associated with the given sequential ML model being higher than any one of those that are associated with ML models of the plurality of ML models preceding the given sequential ML model (Donderici, Annotated Fig. 6 and Para. 0136-0139 when exiting neural network 600 early through the “early exit 622”, such that scene data has only been processed by a first group of layers, or the first sub-model, there will be a “reduction in the accuracy and/or safety of the output” compared to compute path 620, which proceeds through a second group of layers, or second sub-model);
the method comprising: receiving the sensed data representative of the surroundings of the SDC (Donderici, Para. 0015, 0047-0048 – the autonomous vehicle (AV) receives “metrics collected by the sensor systems”, or “sensor data”);
feeding the sensed data to the plurality of the ML models (Donderici, Para. 0059-0060, 0066-0067 – wherein the “sensor data” is provided, or fed, to the “machine learning models”, i.e. “neural networks”, as “input data”), the feeding comprising:
feeding the sensed data to the first ML model (Donderici, Fig. 5 and Para. 0059-0060, 0066-0067 – wherein the “sensor data” is provided, or fed, to the “machine learning models”, i.e. “neural networks”, as “input data”, for example the “neural network model 502”), thereby causing the first ML model to generate a first prediction of (i) an object class of at least one object in the surroundings of the SDC and (ii) a location of the at least one object (Donderici, Fig. 5 and Para. 0068, 0118 – for example, in Fig. 5, the first ML model, or “neural network model 502”, generates an “output” which is “a classification and/or localization of one or more objects in an input image”); and
in response to a first time for generating the first prediction by the first ML model being higher than a first predetermined time period threshold assigned to the first ML model for generating the first prediction (Donderici, Annotated Fig. 6 and Para. 0026-0027, 0137-0138, 0142, 0148 – wherein the neural network, for example neural network 600 of Fig. 6, may take an “alternate path with the early exit 622” based on “a desired balance between at least two of a processing latency, a compute cost, a safety metric, and/or an output accuracy”, for example to “decrease the processing latency” in order to meet a latency metric, wherein the latency metric may be “a desired reaction time”, or threshold latency; wherein the section of neural network 600 before the “alternate path with the early exit 622” can be considered a first sub-model and the remaining portion a second sub-model as annotated):
planning the motion of the SDC based on the first prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network); and
in response to the first time for generating the first prediction by the first ML model wherein the neural network 600 selects between “the compute path 620 and the alternate path with the early exit 622” based on “a desired balance between at least two of a processing latency, a compute cost, a safety metric, and/or an output accuracy”):
feeding the first prediction along with the sensed data to the second ML model, thereby causing the second ML model to generate a second prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object (Donderici, Fig. 5 and Para. 0068, 0118, 0126 – for example, in Fig. 5, the first ML model, or “neural network model 502”, generates an “output” which is then fed into neural network model 510 to “process data (e.g., an output(s)) from the neural network model 502” to generate an “output 512”; where the output of the neural networks is a “a classification and/or localization of one or more objects in an input image” from the neural network); and
planning the motion of the SDC based on the second prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network).
PNG
media_image1.png
302
772
media_image1.png
Greyscale
Donderici, Fig. 5
PNG
media_image2.png
429
1031
media_image2.png
Greyscale
Donderici, Annotated Fig. 6
While Donderici teaches a first time for generating the first prediction by the first ML model being higher than a first predetermined time period threshold… planning the motion of the SDC based on the first prediction, Donderici does not explicitly teach in response to the first time for generating the first prediction by the first ML model being equal to or lower than the first predetermined time period threshold, feeding the first prediction along with the sensed data to the second ML model.
However, Shelhamer teaches in response to the first time for generating the first prediction by the first ML model being equal to or lower than the first predetermined time period threshold, feeding the first prediction along with the sensed data to the second ML model (Shelhamer, Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget” wherein if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and outputs a second or final prediction generated by a second set 316 and final set of layers 328, or second and final sub-models, as shown on Fig. 3).
PNG
media_image3.png
475
973
media_image3.png
Greyscale
Shelhamer, Fig. 3
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Donderici to include in response to the first time for generating the first prediction by the first ML model being equal to or lower than the first predetermined time period threshold, feeding the first prediction along with the sensed data to the second ML model, as taught by Shelhamer, in order to continue computations within the time period threshold and improve the accuracy of output object classification and localization.
In regards to Claim 2, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici and Shelhamer further teach wherein the first time for generating the first prediction is indicative of current availability of computational resources of the processor of the SDC (Donderici, Para. 0027, 0035, 0040, 0133, 0137-0138 – wherein machine learning model may select a compute path based on the “desired reaction time” to “achieve a more predictable/stable processing time and/or latency/delay” to achieve a “reduction in compute” and thus “consume less compute”, or resources; Shelhamer, Para. 0029, 0032, 0045, 0051-0052, 0073, 0088 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget”).
In regards to Claim 3, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici further teaches further comprising training each one of the plurality of ML models (Donderici, Para. 0055 and 0059 – “AI/ML platform 154 can provide the infrastructure for training and evaluating machine learning algorithms for operating the AV 102”), by:
generating a training set of data comprising a plurality of training digital objects, a given one of which includes: (i) training sensed data representative of training surroundings of the SDC; and (ii) a respective label representative of the object class and location of at least one training object in the training surrounding of the SDC (Donderici, Para. 0071-0072 – “training data that includes images and/or labels”, for example “each training image having a label indicating the classes of the one or more objects or features in each image (e.g., indicating to the network what the objects are and what features they have)”); and
feeding the training data to the plurality of ML models (Donderici, Para. 0071-0072 – “the neural network can be trained using training data”), the feeding comprising:
to the first ML model of the plurality of ML models, feeding the training data (Donderici, Fig. 5 and Para. 0066-0072 – “the neural network 210 can be trained using training data”, where neural network 210 is an exemplary neural network; for example “neural network 502” can be trained using training data);
to the given sequential ML model following the first ML model, feeding, by the processor: (i) the training data and (ii) a respective training prediction generated by an ML model preceding the given sequential one (Donderici, Fig. 5 and Para. 0071-0072, 0119, 0134 – neural network 510, which is placed successively after neural network 502 as shown in Fig. 5, “can be trained using a cluster of scene features determined and/or processed by the neural network model 502”, i.e. “an output of the neural network model 502” in addition to “a training dataset” cited above); and
optimizing, for each one of the plurality of ML models, a respective difference between the respective training prediction thereof and the respective label, thereby training each one of the plurality of ML models to detect the objects on the surroundings of the SDC (Donderici, Para. 0071-0077 – wherein the “goal of training is to minimize the amount of loss so that the predicted output is the same as the training label” in order to accurately determine “objects or features in each image”).
PNG
media_image1.png
302
772
media_image1.png
Greyscale
Donderici, Fig. 5
In regards to Claim 4, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici in view of Shelhamer further teaches further comprising:
in response to the second time for generating the second prediction by the second ML model being equal to or lower than a second predetermined time period threshold assigned to the second ML model for generating the second prediction (Shelhamer, Annotated Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget” wherein if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and outputs a second prediction generated by a second set 316, or second sub-models, as shown on annotated Fig. 3):
feeding the second prediction along with the sensed data to a third ML model of the plurality of ML models, following the second ML model, thereby causing the third ML model to generate a third prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object (Shelhamer, Annotated Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088, 0119 – if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and inputs the “second set of features 318” from the second set of layers 316, or second sub-model, into the final set of layers 328, or final sub-model, to generate a “final pixel-wise classification prediction 332”, where features are extracted “in a spatially aware sense” and correspond to “particular spatial locations”; where the final set of layers 328 is the third set of layers illustrated in Fig. 3); and
planning the motion of the SDC based on the third prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network; Shelhamer, Fig. 3 and Para. 0056, 0073, 0088 – generating “the final pixel-wise classification prediction 332” from the “final set of features 330” outputted by the final set of layers, or third sub-model, for use in “time-critical scenarios” such as “autonomous driving”).
PNG
media_image3.png
475
973
media_image3.png
Greyscale
Shelhamer, Fig. 3
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method including the above limitations of Donderici in view of Shelhamer to furtherinclude further comprising: in response to the second time for generating the second prediction by the second ML model being equal to or lower than a second predetermined time period threshold assigned to the second ML model for generating the second prediction: feeding the second prediction along with the sensed data to a third ML model of the plurality of ML models, following the second ML model, thereby causing the third ML model to generate a third prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object; and planning the motion of the SDC based on the third prediction., as taught by Shelhamer, in order to continue computations within the time period threshold and provide the most accurate output object classification and localization allowable by the computation resources.
In regards to Claim 5, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici in view of Shelhamer further teaches further comprising:
in response to a respective time for generating a respective prediction by the given sequential ML model being equal to or lower than a respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: feeding the respective prediction along with the sensed data to a next sequential ML model of the plurality of ML models, following the given sequential ML model, thereby causing the next sequential ML model to generate another respective prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object (Shelhamer, Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088, 0119 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget” wherein if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and outputs a second or final “classification prediction” generated by a second set 316/final set of layers 328, or second/final sub-models, as shown on Fig. 3, where features are extracted “in a spatially aware sense” and correspond to “particular spatial locations”; where the process of Shelhamer may proceed to an Nth set of layers, or sub-model, as long as “the computation budget allows for it”); and
planning the motion of the SDC based on the respective prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network) generated by the next sequential ML model (Shelhamer, Fig. 3 and Para. 0056, 0073, 0088 – generating “the Nth pixel-wise classification prediction” from the “Nth set of features 330” outputted by the Nth set of layers, or Nth sub-model, for use in “time-critical scenarios” such as “autonomous driving”); and
in response to the respective time for generating the respective prediction by the given sequential ML model being higher than the respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: planning the motion of the SDC (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network) based on the respective prediction generated by an ML model, immediately preceding the given sequential ML model (Shelhamer, Para. 0052-0056 – “based on timing of a request for a prediction”, e.g. “expiration of a time threshold”, the “the multi-exit pixel-level prediction neural network 108 generates and provides the first pixel-wise classification prediction 204, the second pixel-wise classification prediction 206, or some other pixel-wise classification prediction”, where if the request for a prediction occurs “before completion of the entire analysis”, an earlier “pixel-wise classification prediction” generated by an earlier exit head, corresponding to an earlier set of layers, or sub-model, is outputted, for use in “time-critical scenarios” such as “autonomous driving”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method including the above limitations of Donderici in view of Shelhamer to further include further comprising: in response to a respective time for generating a respective prediction by the given sequential ML model being equal to or lower than a respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: feeding the respective prediction along with the sensed data to a next sequential ML model of the plurality of ML models, following the given sequential ML model, thereby causing the next sequential ML model to generate another respective prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object; and planning the motion of the SDC based on the respective prediction generated by the next sequential ML model; and in response to the respective time for generating the respective prediction by the given sequential ML model being higher than the respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: planning the motion of the SDC based on the respective prediction generated by an ML model, immediately preceding the given sequential ML model, as taught by Shelhamer, in order to continue computations within the time period threshold and provide the most accurate output object classification and localization allowable by the computation resources.
In regards to Claim 6, Donderici in view of Shelhamer teaches the method of Claim 5, and Donderici further teaches wherein a given value of the object detection precision metric is indicative of a number of features of a given object determined by a respective ML model (Donderici, Para. 0048, 0080-0089 – identifying a number of pixels associated with “specific features” of a “perceived object”, which can be visualized by “a bounding area”, by the neural network).
In regards to Claim 7, Donderici in view of Shelhamer teaches the method of Claim 6, and Donderici further teaches wherein the object detection precision metric is an average precision metric (Donderici, Para. 0099, 0105, 0137 – determining “an output quality metric, an accuracy or performance metric”, etc., for a given compute path, for example the early exit path).
In regards to Claim 8, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici further teaches wherein a given one of the plurality of ML models is a neural network (Donderici, Para. 0057, 0066, 0078 – where the “Artificial Intelligence/Machine Learning (AI/ML) platform” or “computer of AV” implement “neural network(s)”; wherein the “neural network(s)” can be “any suitable deep network”, such as “a convolutional neural network (CNN)”, “a Recurrent Neural Networks (RNNs)”, etc.).
In regards to Claim 9, Donderici in view of Shelhamer teaches the method of Claim 8, and Donderici further teaches wherein the given sequential model of the plurality of ML models is configured to determine a larger number of features of the at least one object in the surroundings of the SDC than any ML model preceding the given sequential ML model (Donderici, Annotated Fig. 6 and Para. 0136-0139 when exiting neural network 600 early through the “early exit 622”, such that scene data has only been processed by a first group of layers, or the first sub-model, there will be a “reduction in the accuracy and/or safety of the output” compared to compute path 620, which proceeds through a second group of layers, or second sub-model).
PNG
media_image2.png
429
1031
media_image2.png
Greyscale
Donderici, Annotated Fig. 6
In regards to Claim 10, Donderici in view of Shelhamer teaches the method of Claim 9, and Donderici further teaches wherein the given ML model of the plurality of ML models has more layers than any ML model preceding the given ML model (Donderici, Annotated Fig. 6 and Para. 0136-0138 – wherein when using the compute path 620, sub-model, to process the scene data, the resulting output is processed through more layers than the compute path 622, which exits before the remaining layers/sub-model).
In regards to Claim 11, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici further teaches wherein the planning the motion of the SDC comprises generating a trajectory for the SDC (Donderici, Para. 0015, 0030, 0050-0052 – “processes”, or “methods”, “for learning compute paths that an AV can implement/execute to perform an operation(s)” such as “commands for the actuators that control the AV's steering, throttle, brake, and drive unit” to “implement the [determined] final path or actions”; where an AV is an “autonomous vehicle”).
In regards to Claim 12, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici further teaches wherein the planning the motion of the SDC comprises determining motion parameters of the SDC, including at least one of: a displacement, a velocity, and an acceleration of the SDC at a given future moment in time (Donderici, Para. 0015, 0030, 0050-0052 – “processes”, or “methods”, “for learning compute paths that an AV can implement/execute to perform an operation(s)” to “implement the [determined] final path or actions”; for example a “planning stack” of the AV can determine “a specified rate of acceleration”, “maintaining the same speed” “or decelerating”, directing the AV to a different location, or displacement, etc. based on “predicted path[s]” of objects “at future time intervals”).
Regarding Claim 1, Donderici teaches: A system for planning motion of a Self-Driving Car (SDC) (Donderici, Para. 0015, 0030, 0047 – “a local computing device” in communication with various “systems” for “learning compute paths that an AV can implement/execute to perform an operation(s)”; where an AV is an “autonomous vehicle”), the SDC having a plurality of sensors configured to generate sensed data representative of surroundings of the SDC (Donderici, Para. 0015, 0047-0048 – “autonomous vehicles (AVs) can include various sensors” to “enable the AV 102 to “see”…, “hear”…, and “feel”… its environment” by collecting metrics, or data; for example cameras, LIDAR sensors, microphones, etc.), the system comprising at least one processor communicatively coupled to the SDC (Donderici, Para. 0047 – “one or more processors”) and configured to execute a plurality of Machine-Learning (ML) models for detecting objects in the surroundings of the SDC (Donderici, Para. 0031, 0047, 0057-0061, 0066 – wherein the “one or more processors” of the autonomous vehicle (AV) are in communication with an “AI/ML platform” of a “data center”, where the “AI/ML platform” provides “machine learning models” for “object detection”; furthermore, the AV may implement the models, or “neural network[s]” by “a computer of AV” to “provide a classification and/or localization of one or more objects in an input image”), the plurality of ML models being sequentially connected therebetween, such that an output of a first ML model of the plurality of ML models is used as an input to a second ML model, sequentially following after the first ML model in the plurality of ML models (Donderici, Fig. 5-6, Para. 0030-0037, 0069-0073, 0078, 0118-0123 – a machine learning model may be “a neural network(s)” comprising several neural network models “within a larger neural network model”, for example, Fig. 5 illustrates neural network models 502, 510, and 520, wherein models 510 and 520 are placed sequentially after model 502, representing two compute paths, and receive the “output from [the] neural network model 502” as input; and further Fig. 6 illustrates a “neural network” comprising of two paths 620 and 622, wherein alternate path 622 represents an “early exit” following a first portion, or sub-model, of the neural network model),
each one of the plurality of ML models having been trained to detect the objects in the surroundings of the SDC based on the sensed data representative thereof (Donderici, Para. 0059-0060, 0064-0068 – wherein the provided “machine learning models”, comprising “neural network(s)” are trained, evaluated, and refined for “object detection”; for example, the neural networks receive input “sensor data” capturing “a view, scene, environment, shape, and/or object” and output “classification and/or localization of one or more objects in an input image”);
a given sequential ML model being configured to detect the objects with a respective value of an object detection precision metric (Donderici, Para. 0080-0081 – “the neural network 210 can generate a mean score (or z-score) of each feature” and “a class of an object or a probability of classes that best describes the objects in the image”, indicating accuracy), the respective value of the object detection precision metric associated with the given sequential ML model being higher than any one of those that are associated with ML models of the plurality of ML models preceding the given sequential ML model (Donderici, Annotated Fig. 6 and Para. 0136-0139 when exiting neural network 600 early through the “early exit 622”, such that scene data has only been processed by a first group of layers, or the first sub-model, there will be a “reduction in the accuracy and/or safety of the output” compared to compute path 620, which proceeds through a second group of layers, or second sub-model);
the system further comprising at least one non-transitory computer-readable memory comprising executable instructions (Donderici, Para. 0047 – “one or more processors and memory, including instructions that can be executed by the one or more processors”) that, when executed by the at least one processor, cause the system to:
receive the sensed data representative of the surroundings of the SDC (Donderici, Para. 0015, 0047-0048 – the autonomous vehicle (AV) receives “metrics collected by the sensor systems”, or “sensor data”);
feed the sensed data to the plurality of the ML models (Donderici, Para. 0059-0060, 0066-0067 – wherein the “sensor data” is provided, or fed, to the “machine learning models”, i.e. “neural networks”, as “input data”), by:
feeding the sensed data to the first ML model (Donderici, Fig. 5 and Para. 0059-0060, 0066-0067 – wherein the “sensor data” is provided, or fed, to the “machine learning models”, i.e. “neural networks”, as “input data”, for example the “neural network model 502”), thereby causing the first ML model to generate a first prediction of (i) an object class of at least one object in the surroundings of the SDC and (ii) a location of the at least one object (Donderici, Fig. 5 and Para. 0068, 0118 – for example, in Fig. 5, the first ML model, or “neural network model 502”, generates an “output” which is “a classification and/or localization of one or more objects in an input image”); and
in response to a first time for generating the first prediction by the first ML model being higher than a first predetermined time period threshold assigned to the first ML model for generating the first prediction (Donderici, Annotated Fig. 6 and Para. 0026-0027, 0137-0138, 0142, 0148 – wherein the neural network, for example neural network 600 of Fig. 6, may take an “alternate path with the early exit 622” based on “a desired balance between at least two of a processing latency, a compute cost, a safety metric, and/or an output accuracy”, for example to “decrease the processing latency” in order to meet a latency metric, wherein the latency metric may be “a desired reaction time”, or threshold latency; wherein the section of neural network 600 before the “alternate path with the early exit 622” can be considered a first sub-model and the remaining portion a second sub-model as annotated):
planning the motion of the SDC based on the first prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network); and
in response to the first time for generating the first prediction by the first ML model wherein the neural network 600 selects between “the compute path 620 and the alternate path with the early exit 622” based on “a desired balance between at least two of a processing latency, a compute cost, a safety metric, and/or an output accuracy”):
feeding the first prediction along with the sensed data to the second ML model, thereby causing the second ML model to generate a second prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object (Donderici, Fig. 5 and Para. 0068, 0118, 0126 – for example, in Fig. 5, the first ML model, or “neural network model 502”, generates an “output” which is then fed into neural network model 510 to “process data (e.g., an output(s)) from the neural network model 502” to generate an “output 512”; where the output of the neural networks is a “a classification and/or localization of one or more objects in an input image” from the neural network); and
planning the motion of the SDC based on the second prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network).
PNG
media_image1.png
302
772
media_image1.png
Greyscale
Donderici, Fig. 5
PNG
media_image2.png
429
1031
media_image2.png
Greyscale
Donderici, Annotated Fig. 6
While Donderici teaches a first time for generating the first prediction by the first ML model being higher than a first predetermined time period threshold… planning the motion of the SDC based on the first prediction, Donderici does not explicitly teach in response to the first time for generating the first prediction by the first ML model being equal to or lower than the first predetermined time period threshold, feeding the first prediction along with the sensed data to the second ML model.
However, Shelhamer teaches in response to the first time for generating the first prediction by the first ML model being equal to or lower than the first predetermined time period threshold, feeding the first prediction along with the sensed data to the second ML model (Shelhamer, Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget” wherein if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and outputs a second or final prediction generated by a second set 316 and final set of layers 328, or second and final sub-models, as shown on Fig. 3).
PNG
media_image3.png
475
973
media_image3.png
Greyscale
Shelhamer, Fig. 3
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Donderici to include in response to the first time for generating the first prediction by the first ML model being equal to or lower than the first predetermined time period threshold, feeding the first prediction along with the sensed data to the second ML model, as taught by Shelhamer, in order to continue computations within the time period threshold and improve the accuracy of output object classification and localization.
In regards to Claim 14, Donderici in view of Shelhamer teaches the system of Claim 13, and Donderici and Shelhamer further teach wherein the first time for generating the first prediction is indicative of current availability of computational resources of the processor of the SDC (Donderici, Para. 0027, 0035, 0040, 0133, 0137-0138 – wherein machine learning model may select a compute path based on the “desired reaction time” to “achieve a more predictable/stable processing time and/or latency/delay” to achieve a “reduction in compute” and thus “consume less compute”, or resources; Shelhamer, Para. 0029, 0032, 0045, 0051-0052, 0073, 0088 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget”).
In regards to Claim 15, Donderici in view of Shelhamer teaches the system of Claim 13, and Donderici further teaches wherein the at least one processor (Donderici, Para. 0047 – “one or more processors”) further causes the system to train each one of the plurality of ML models (Donderici, Para. 0055 and 0059 – “AI/ML platform 154 can provide the infrastructure for training and evaluating machine learning algorithms for operating the AV 102”), by:
generating a training set of data comprising a plurality of training digital objects, a given one of which includes: (i) training sensed data representative of training surroundings of the SDC; and (ii) a respective label representative of the object class and location of at least one training object in the training surrounding of the SDC (Donderici, Para. 0071-0072 – “training data that includes images and/or labels”, for example “each training image having a label indicating the classes of the one or more objects or features in each image (e.g., indicating to the network what the objects are and what features they have)”); and
feeding the training data to the plurality of ML models (Donderici, Para. 0071-0072 – “the neural network can be trained using training data”), the feeding comprising:
to the first ML model of the plurality of ML models, feeding the training data (Donderici, Fig. 5 and Para. 0066-0072 – “the neural network 210 can be trained using training data”, where neural network 210 is an exemplary neural network; for example “neural network 502” can be trained using training data);
to the given sequential ML model following the first ML model, feeding, by the processor: (i) the training data and (ii) a respective training prediction generated by an ML model preceding the given sequential one (Donderici, Fig. 5 and Para. 0071-0072, 0119, 0134 – neural network 510, which is placed successively after neural network 502 as shown in Fig. 5, “can be trained using a cluster of scene features determined and/or processed by the neural network model 502”, i.e. “an output of the neural network model 502” in addition to “a training dataset” cited above); and
optimizing, for each one of the plurality of ML models, a respective difference between the respective training prediction thereof and the respective label, thereby training each one of the plurality of ML models to detect the objects on the surroundings of the SDC (Donderici, Para. 0071-0077 – wherein the “goal of training is to minimize the amount of loss so that the predicted output is the same as the training label” in order to accurately determine “objects or features in each image”).
PNG
media_image1.png
302
772
media_image1.png
Greyscale
Donderici, Fig. 5
In regards to Claim 4, Donderici in view of Shelhamer teaches the method of Claim 1, and Donderici in view of Shelhamer further teaches wherein the at least one processor (Donderici, Para. 0047 – “one or more processors”) further causes the system to:
in response to the second time for generating the second prediction by the second ML model being equal to or lower than a second predetermined time period threshold assigned to the second ML model for generating the second prediction (Shelhamer, Annotated Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget” wherein if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and outputs a second prediction generated by a second set 316, or second sub-models, as shown on annotated Fig. 3):
feed the second prediction along with the sensed data to a third ML model of the plurality of ML models, following the second ML model, thereby causing the third ML model to generate a third prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object (Shelhamer, Annotated Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088, 0119 – if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and inputs the “second set of features 318” from the second set of layers 316, or second sub-model, into the final set of layers 328, or final sub-model, to generate a “final pixel-wise classification prediction 332”, where features are extracted “in a spatially aware sense” and correspond to “particular spatial locations”; where the final set of layers 328 is the third set of layers illustrated in Fig. 3); and
plan the motion of the SDC based on the third prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network; Shelhamer, Fig. 3 and Para. 0056, 0073, 0088 – generating “the final pixel-wise classification prediction 332” from the “final set of features 330” outputted by the final set of layers, or third sub-model, for use in “time-critical scenarios” such as “autonomous driving”).
PNG
media_image3.png
475
973
media_image3.png
Greyscale
Shelhamer, Fig. 3
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system including the above limitations of Donderici in view of Shelhamer to further include wherein the at least one processor further causes the system to: in response to the second time for generating the second prediction by the second ML model being equal to or lower than a second predetermined time period threshold assigned to the second ML model for generating the second prediction: feed the second prediction along with the sensed data to a third ML model of the plurality of ML models, following the second ML model, thereby causing the third ML model to generate a third prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object; and plan the motion of the SDC based on the third prediction, as taught by Shelhamer, in order to continue computations within the time period threshold and provide the most accurate output object classification and localization allowable by the computation resources.
In regards to Claim 17, Donderici in view of Shelhamer teaches the system of Claim 13, and Donderici in view of Shelhamer further teaches wherein the at least one processor (Donderici, Para. 0047 – “one or more processors”) further causes the system to:
in response to a respective time for generating a respective prediction by the given sequential ML model being equal to or lower than a respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: feeding the respective prediction along with the sensed data to a next sequential ML model of the plurality of ML models, following the given sequential ML model, thereby causing the next sequential ML model to generate another respective prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object (Shelhamer, Fig. 3 and Para. 0029, 0032, 0045, 0051-0052, 0073, 0088, 0119 – employing a “time threshold” for “predictions of objects portrayed in the digital images” based on “computation budget” wherein if given “enough time” without “interruption” requiring earlier prediction, such that the “time threshold” does not expire, the model proceeds and outputs a second or final “classification prediction” generated by a second set 316/final set of layers 328, or second/final sub-models, as shown on Fig. 3, where features are extracted “in a spatially aware sense” and correspond to “particular spatial locations”; where the process of Shelhamer may proceed to an Nth set of layers, or sub-model, as long as “the computation budget allows for it”); and
plan the motion of the SDC based on the respective prediction (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network) generated by the next sequential ML model (Shelhamer, Fig. 3 and Para. 0056, 0073, 0088 – generating “the Nth pixel-wise classification prediction” from the “Nth set of features 330” outputted by the Nth set of layers, or Nth sub-model, for use in “time-critical scenarios” such as “autonomous driving”); and
in response to the respective time for generating the respective prediction by the given sequential ML model being higher than the respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: plan the motion of the SDC (Donderici, Para. 0051 and 0068 – wherein a “planning stack” of the AV determines “sets of one or more mechanical operations that the AV 102 can perform” based on outputs such as “a classification and/or localization of one or more objects in an input image” from the neural network) based on the respective prediction generated by an ML model, immediately preceding the given sequential ML model (Shelhamer, Para. 0052-0056 – “based on timing of a request for a prediction”, e.g. “expiration of a time threshold”, the “the multi-exit pixel-level prediction neural network 108 generates and provides the first pixel-wise classification prediction 204, the second pixel-wise classification prediction 206, or some other pixel-wise classification prediction”, where if the request for a prediction occurs “before completion of the entire analysis”, an earlier “pixel-wise classification prediction” generated by an earlier exit head, corresponding to an earlier set of layers, or sub-model, is outputted, for use in “time-critical scenarios” such as “autonomous driving”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system including the above limitations of Donderici in view of Shelhamer to further include wherein the at least one processor further causes the system to: in response to a respective time for generating a respective prediction by the given sequential ML model being equal to or lower than a respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: feed the respective prediction along with the sensed data to a next sequential ML model of the plurality of ML models, following the given sequential ML model, thereby causing the next sequential ML model to generate another respective prediction of (i) the object class of the at least one object in the surroundings of the SDC and (ii) the location of the at least one object; and plan the motion of the SDC based on the respective prediction generated by the next sequential ML model; and in response to the respective time for generating the respective prediction by the given sequential ML model being higher than the respective predetermined time period threshold assigned to the given ML model for generating the respective prediction: plan the motion of the SDC based on the respective prediction generated by an ML model, immediately preceding the given sequential ML model, as taught by Shelhamer, in order to continue computations within the time period threshold and provide the most accurate output object classification and localization allowable by the computation resources.
In regards to Claim 18, Donderici in view of Shelhamer teaches the system of Claim 11, and Donderici further teaches wherein a given value of the object detection precision metric is indicative of a number of features of a given object determined by a respective ML model (Donderici, Para. 0048, 0080-0089 – identifying a number of pixels associated with “specific features” of a “perceived object”, which can be visualized by “a bounding area”, by the neural network).
In regards to Claim 19, Donderici in view of Shelhamer teaches the system of Claim 18, and Donderici further teaches wherein the given sequential model of the plurality of ML models is configured to determine a larger number of features of the at least one object in the surroundings of the SDC than any ML model preceding the given sequential ML model (Donderici, Annotated Fig. 6 and Para. 0136-0139 when exiting neural network 600 early through the “early exit 622”, such that scene data has only been processed by a first group of layers, or the first sub-model, there will be a “reduction in the accuracy and/or safety of the output” compared to compute path 620, which proceeds through a second group of layers, or second sub-model).
PNG
media_image2.png
429
1031
media_image2.png
Greyscale
Donderici, Annotated Fig. 6
In regards to Claim 20, Donderici in view of Shelhamer teaches the method of Claim 19, and Donderici further teaches wherein the given ML model of the plurality of ML models has more layers than any ML model preceding the given ML model (Donderici, Annotated Fig. 6 and Para. 0136-0138 – wherein when using the compute path 620, sub-model, to process the scene data, the resulting output is processed through more layers than the compute path 622, which exits before the remaining layers/sub-model).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Kouris, et al. (U.S. Patent Application Pub. No. 2023/0128637) teaches a method for training a machine learning, ML, model to perform semantic image segmentation, and to a computer-implemented method and apparatus for performing semantic image segmentation using a trained machine learning, ML, model.
Kursar (U.S. Patent Application Pub. No. 2020/0073969) teaches systems and methods for using vehicles as mobile observation platforms and improving the querying of visual data within the vehicles by leveraging edge computing resources of the vehicles in a distributed network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HELEN LI whose telephone number is (703)756-4719. The examiner can normally be reached Monday through Friday, from 9am to 5pm eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hunter Lonsberry can be reached at (571) 272-7298. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/H.L./Examiner, Art Unit 3665
/HUNTER B LONSBERRY/Supervisory Patent Examiner, Art Unit 3665