Last updated: May 04, 2026
Application No. 17/465,398
SYSTEMS AND METHODS FOR TRAINING AND USING MACHINE LEARNING MODELS AND ALGORITHMS

Final Rejection §103§112
Filed
Sep 02, 2021
Examiner
ALABI, OLUWATOSIN O
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Ford Global Technologies LLC
OA Round
2 (Final)
Interview Optional

— +22.8% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 60% grant rate with +22.8% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 205 resolved cases, 2023–2026
Examiner Intelligence

ALABI, OLUWATOSIN O View full profile →
Grants 60% of resolved cases
Career Allowance Rate
122 granted / 205 resolved
+4.5% vs TC avg
Strong +23% interview lift
Without
With
+22.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
40 currently pending
Career history
245
Total Applications
across all art units
Statute-Specific Performance

§101
21.7%
-18.3% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
9.5%
-30.5% vs TC avg
§112
23.0%
-17.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 205 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings were received on 09/02/2021.  These drawings are acceptable.


Information Disclosure Statement
The information disclosure statements (IDSs) submitted 01/24/2023 and 09/02/2021 have been considered by the examiner. 

Response to Arguments
Applicant's arguments filed 10/03/2025 have been fully considered.

Applicant’s arguments with respect to claim(s) 1-4, 9-14, and 18-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

The rejection under 35 USC 101, has been withdrawn.



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-4, 9-14, and 18-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding independent claims 1, 10, and 11, the phrases “a derivative vector of a loss function”, “the derivative vectors of the loss functions” the phrase “derivative vector” as claimed in the amended claimed limitations does not appear to be a term of art. The amended claim limitation, highlighted from claim 1, “determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples; defining, by the computing device, a plurality of training subsets from the collection of training examples based on at least one of the derivative vectors of the loss functions or the importance of each training example relative to other training examples, wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training examples;” The claim limitation appears to indicate that the derivative vectors implemented to define a plurality of training subsets such that each subset is different from that of another training subset, typically a derivative vector of a loss function is used to define a direction of a gradient for solving a loss function (e.g. backpropagation) while process a dataset and does not evolve comparing a set of data to another for determining a plurality of training subsets from the collection of training examples In the case of backpropagation computes gradients (e.g. derivative vectors) a loss function with respect to each weight and bias in a neural network to learn the model parameters of the neural network. It is unclear what the intended scope of the phrase should be by one of ordinary skill in the art.  

In addition, the specification paragraphs:
 [0028]  Next, a derivative vector 0 of a loss function lo(x) is determined for each training example (xi, yi). Techniques for determining a derivative vector of a loss function are well known. In some scenarios, the loss function lo(x) generally involves comparing a true value y; with a predicted value y, to obtain an output representing a distance D (e.g., a Euclidean distance) between the true value y; and the predicted value yr. The distance D output from performance of the loss function lo(x) should be small. The derivative vector 0 is randomly initialized for use in a first epoch eo in which a first subset of training examples are analyzed, and changed over all epochs el,..., ew for the other subsets of training examples to iteratively improve the output of the loss function. The derivative vector 0 may be changed in accordance with a known backpropagation algorithm in the direction of the derivative for the loss function (i.e., the direction of the steepest descent of the loss function). The backpropagation algorithm generally computes the gradient of the loss function with respect to the weights of the neural network for a set of input-output training examples, often called a training batch. The specification also highlights the examiners concerns with the amended limitations, as the specification notes the use of derivative vector for improving the output of the loss function that per the applicant specification “the loss function lo(x) generally involves comparing a true value y; with a predicted value y, to obtain an output representing a distance D (e.g., a Euclidean distance) between the true value y; and the predicted value yr.”


 How than is the loss function and the derivative vector computed for each example in the training data? How does claimed derivate vector element and claim feature selection process ascertained by one of ordinary view in the art?
The examiner finds the use of the claimed “derivative vector” as claimed appears to deviate from the conventional and known definition as elements used for optimizing functions/algorithms like gradient decent such as those used in backpropagation. The applicant specification even notes that a derivative vector is computed for a training subset to iterative improve the output the loss function. The intended scope of  the claimed amendment is not clear based on the limitation or the applicant’s specification.  Examiner interprets any system that uses a gradient algorithm as within the scope of the claim limitations. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-3, 5, 10-12, 14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 20250217446, hereinafter ‘Xu’) in view of Song et al. (US 20210284184, hereinafter ‘Song’) in further view of Berry (US 20200364507).

Regarding independent claim 1, Hester teaches a method for training a machine learning model, comprising: obtaining, by a computing device, a training dataset comprising a collection of training examples, each training example comprising at least one data point; (in Fig.1 and Fig. 11D; And in [0059] In at least one embodiment, parameters used to train neural networks can be efficiently determined by first constructing a proxy dataset, which is representative of a full training dataset yet relatively much smaller in size. Proxy dataset may also be referred to herein as proxy data, a portion of training data, or a subset of training data [obtaining, by a computing device, a training dataset comprising a collection of training examples, each training example comprising at least one data point]. In at least one embodiment, parameters used to train neural networks can be estimated using proxy dataset and one or more proxy models, which are essentially smaller networks, yet representative of a larger/complete network structure. In at least one embodiment, utilizing both proxy dataset and proxy networks, computational burden of AutoML can be drastically reduced (e.g., from a few days to a few hours) and yet be able to estimate parameters that can lead to improved performance…)
determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples; (in [0066] FIG. 2 illustrates an example framework 200 for selecting a subset of training data to train a neural network based on uniqueness of training data [determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples], according to at least one embodiment. Top row of framework 200 indicates methods (e.g., using all data or a random subset of data) that can be used to perform parameter estimation. In at least one embodiment, a processor comprising one or more circuits performs bottom row of framework 200 to train one or more neural networks based on uniqueness of data. In at least one embodiment, a processor comprising one or more circuits performs bottom row of framework 200 to estimate parameters to train one or more neural networks based on uniqueness of data (e.g., similarities among input training data)…. And in  [0067] In at least one embodiment, a proxy data selection strategy is performed. In at least one embodiment, to select proxy data, dataset D containing a set of data points {x.sub.1, x.sub.2, . . . x.sub.n} is determined, where x.sub.i is a single data sample [determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples]. Data points may also be referred to herein as a region of interest in a training image. In at least one embodiment, to estimate an importance of a single data point x.sub.i, each data point's utility is estimated in relation to other data points x.sub.j, resulting in a set of paired measures. In at least one embodiment, pairs for x.sub.1 would be {(x.sub.1, x.sub.1), (x.sub.1, x.sub.2), (x.sub.1, x.sub.3) . . . (x.sub.1, x.sub.n)}. In at least one embodiment, a mean of said measure is utilized as an indicator of an importance of each data point (e.g., an indicator representing similarity between each data point among other data points). In at least one embodiment, mutual information (MI), which is shown in Eq. 1 below, is measured on flattened vectors of 3D images and normalized local cross-correlation (NCC), which is shown as Eq. 2 below, in local window size of (9, 9, 9) for each pair of data (x.sub.i, x.sub.j) as different variants…. [0070] In at least one embodiment, comparing training images to determine similarities among said images focuses on a task-specific region of interest. In at least one embodiment, acquisition parameters (e.g., number of slices, resolution, etc.) for different 3D volume scans can vary. In at least one embodiment, when considering a pair (x.sub.i, x.sub.j), even if x.sub.i is re-sampled to x.sub.j image size, there is misalignment for a region of interest (ROI) (e.g., an organ to be annotated by a model). In at least one embodiment, a task-specific ROI is utilized by analyzing information from an existing label. In at least one embodiment, a selected volume is cropped using a ROI and re-sampled to a cubic patch size. In at least one embodiment, data points are ranked by importance and data points containing lowest mutual information or lowest correlation [determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function…] are selected within a given budget B (e.g., amount of computing resources available).)
defining, by the computing device, a plurality of training subsets from the collection of training examples based on at least one of the of the loss functions or the importance of each training example relative to other training examples, wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training examples; (in [0079] In at least one embodiment, a system performing at least a part of process 700 includes executable code to identify 702 an object from one or more training images used to train one or more neural networks. In at least one embodiment, hyper-parameters can be estimated for training a neural network by using a subset [wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training example] of the training images that more accurately reflects a full dataset of training images [defining, by the computing device, a plurality of training subsets from the collection of training examples based on at least one of the derivative vectors of the loss functions or the importance of each training example relative to other training examples]. In at least one embodiment, a system performing at least a part of process 700 includes executable code to first find a region of interest in each training image, such as a bounding box around a particular object to be recognized by a neural network being trained. And in [0064] In at least one embodiment, GPU 106 processes training data 102 (comprising training images) to estimate parameters (e.g., hyper-parameters, values) that are used to control how a neural network is to be trained. In at least one embodiment, GPU processes training data 102 to estimate parameters prior to a neural network to be trained 116 uses training data for training… In at least one embodiment, GPU 106 determines a region of interest in each training image and calculates a score 108. In at least one embodiment, a region of interest in a training image is determined using a bounding box around a particular object that is to be recognized by a neural network 116 being trained…. In at least one embodiment, GPU 106 calculates a score (e.g., similarity score, importance score, indicator), such as a value of 1 (out of 10), and assigns said score to a region of interest in a first image based on said comparison. Assigning scores on a scale of 0 to 10 is just one example embodiment; however, scoring methods may vary (e.g., assigning scores between 1 to 100) and are not limited to a score based on a scale of 0 to 10. In at least one embodiment, said score is stored and added to a list (e.g., table or index)… In at least one embodiment, said score and training image would then be stored with a lower ranking than other scores in said list. In at least one embodiment, if a region of interest is more dissimilar to other regions of interest of in corresponding images [wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training examples based on examples in corresponding region of interests], a higher importance score will be assigned because dissimilar images are not redundant to other images. Storing scores in a list based on comparisons between regions of interests and ranking scores higher based on dissimilarity between corresponding regions of interests is one example embodiment; however, other ways to store and rank scores (e.g., ranking scores that have greater similarity higher instead of lower) can also be utilized… [0067] In at least one embodiment, a proxy data selection strategy is performed. In at least one embodiment, to select proxy data, dataset D containing a set of data points {x.sub.1, x.sub.2, . . . x.sub.n} is determined, where x.sub.i is a single data sample. Data points may also be referred to herein as a region of interest in a training image…. )
and executing,  the computing device, at least two epoch of training of the machine learning model; using, for each epoch of training, one training subset among the plurality of training subsets, …  generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model; ([0079] In at least one embodiment, a system performing at least a part of process 700 includes executable code to identify 702 an object from one or more training images used to train one or more neural networks [and executing,  the computing device, at least two epoch of training of the machine learning model; using, for each epoch of training, one training subset among the plurality of training subsets, … ]. In at least one embodiment, hyper-parameters can be estimated for training a neural network by using a subset of the training images that more accurately reflects a full dataset of training images [generating, by the machine learning model, a plurality of feature embeddings using the one training subset,]. In at least one embodiment, a system performing at least a part of process 700 includes executable code to first find a region of interest in each training image, such as a bounding box around a particular object to be recognized by a neural network being trained. In at least one embodiment, a system performing at least a part of process 700 includes executable code to compare 704, for each image, a region of interest with a corresponding region of interest of every other interest. In at least one embodiment, object in region of interest is compared with objects in corresponding regions of interest among said one or more training images; And in [0127] In at least one embodiment, one or more systems depicted in FIG. 11A are utilized to implement a framework for training one or more neural networks based on uniqueness of training data. In at least one embodiment, one or more systems depicted in FIG. 11A are utilized to perform various processes such as those described in connection with FIGS. 1-7. In at least one embodiment, one or more systems depicted in FIG. 11A are utilized to use a subset of training data, such as training images, based on uniqueness of training data to estimate parameters as part of one or more training process of a neural network model [generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model]. )
and outputting, by the computing device, the machine learning model as a trained model to be employed in a vehicle. (in [0220] In at least one embodiment, server(s) 1178 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation (e.g., using a game engine). In at least one embodiment, any amount of training data is tagged (e.g., where associated neural network benefits from supervised learning) and/or undergoes other pre-processing. In at least one embodiment, any amount of training data is not tagged and/or pre-processed (e.g., where associated neural network does not require supervised learning). In at least one embodiment, once machine learning models are trained, machine learning models may be used by vehicles [and outputting, by the computing device, the machine learning model as a trained model to be employed in a vehicle.] (e.g., transmitted to vehicles over network(s) 1190), and/or machine learning models may be used by server(s) 1178 to remotely monitor vehicles.)

While Xu teaches training a neural network on a subset of data based on a feature selection process, Xu does not expressly teach the iterative training process claimed in the limitation using, for each epoch of training, one training subset among the plurality of training subsets, the training subset being employed for each epoch is a different training subset, and for each epoch of training:
Song does expressly teach the iterative training process claimed in the limitation using, for each epoch of training, one training subset among the plurality of training subsets, the training subset being employed for each epoch is a different training subset, and for each epoch of training: (in [0120] In one example, the machine learning model is a neural network model and the system trains the neural network model over multiple training iterations. At each training iteration [using, for each epoch of training, one training subset among the plurality of training subsets, the training subset being employed for each epoch is a different training subset, and for each epoch of training], the system selects a current mini-batch of one or more training examples from the candidate training data [one training subset among the plurality of training subsets], and then determines an “augmented” mini-batch of training examples by transforming the training inputs in the current mini-batch of training examples using the current point cloud augmentation policy. Optionally, the system may adjust the target outputs in the current mini-batch of training examples to account for the transformations applied to the training inputs (as described earlier). The system processes the transformed training inputs in accordance with the current parameter values of the machine learning model to generate corresponding outputs. The system then determines gradients of an objective function that measures a similarity between: (i) the outputs generated by the machine learning model, and (ii) the target outputs specified by the training examples, and uses the gradients to adjust the current values of the machine learning model parameters [generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model]. The system may determine the gradients using, e.g., a backpropagation procedure, and the system may use the gradients to adjust the current values of the machine learning model parameters using any appropriate gradient descent optimization procedure, e.g., an RMSprop or Adam procedure…)
Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing and retrieving information for training a machine learning model having a plurality of model parameters to perform a particular neural network task using training data subsets, as disclosed by Song with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as disclosed by Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Song and Xu for the benefit  of developing and implementing progressive population based training that allows the system to make more efficient use of computational resources, e.g., memory, wall clock time, or both during training, (Song, 0073).
While Xu and Song teaches the process for training a neural network using backpropagation, one of ordinary skill in the art would understand that the process of  training and adjusting the parameters of a neural network during training involves measuring and reducing a distance among similar feature embeddings. 
Berry does expressly teaches training and adjusting the parameters of a neural network during training involves measuring and reducing a distance among similar feature embeddings, in [0044] In one embodiment, the training module 203 incrementally adjusts the model parameters until the model maximizes the distances between dissimilar objects and/or minimizes distances between similar objects [generating, by the machine learning model, a plurality of feature embeddings …, and reducing, by the computing device, distance between feature embeddings to train the machine learning model] using the loss function (e.g., achieves a target maximum and/or minimum distance separation for objects). In other words, a “trained” embedding neural network 109 is a machine learning model with parameters (e.g., coefficients, weights, etc.) adjusted to make predictions of the embedding layer 107 with maximum distances between dissimilar objects and/or minimum distances between similar objects with respect to the ground truth data. In step 305, the layer module 205 then uses the trained embedding neural network 109 to predict the embedding layer 107 based on specified parameters such as but not limited to embedding layer size (e.g., number of dimensions), loss function, metric for loss function, and/or the like. The resulting embedding layer 107 would then represent the semantic relationships or distances among the map features of the geographic database 103
Berry, Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing and retrieving information for training a machine learning model based on processing exacted data features, as disclosed by Berry with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as collectively disclosed by Song and Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Berry, Song and Xu for the benefit  of providing map embedding analytics for neural networks to improve, for instance, map feature classification or determining semantic relationships between the map features (e.g., like the path by which a user can navigate from a parking lot to a store front), and/or the like (Berry, 0002 & 0031-0032).
Regarding claims 10-12 and 14, the limitations are similar to those in claims 1-3 and 5 respectively and are thus rejected under the same rationale. 
Regarding claims 19 and 20, the limitations are simar to those in claims 1 and 2 respectively and are rejected under the same rationale.
Regarding claim 2, the rejection of claim 1 is incorporated and Xu in combination with Song and Berry teaches the method according to claim 1, further comprising using the trained model to control operations of a mobile platform, as the vehicle. (in [0220] In at least one embodiment, server(s) 1178 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation (e.g., using a game engine). In at least one embodiment, any amount of training data is tagged (e.g., where associated neural network benefits from supervised learning) and/or undergoes other pre-processing. In at least one embodiment, any amount of training data is not tagged and/or pre-processed (e.g., where associated neural network does not require supervised learning). In at least one embodiment, once machine learning models are trained, machine learning models may be used by vehicles [further comprising using the trained model to control operations of a mobile platform, as the vehicle] (e.g., transmitted to vehicles over network(s) 1190), and/or machine learning models may be used by server(s) 1178 to remotely monitor vehicles.; And in [0455] In at least one embodiment, one or more PPUs 3300 are configured to accelerate High Performance Computing (“HPC”), data center, and machine learning applications. In at least one embodiment, PPU 3300 is configured to accelerate deep learning systems and applications including following non-limiting examples: autonomous vehicle platforms,...)

Regarding claim 3, the rejection of claim 1 is incorporated and Xu in combination with Song and Berry teaches the method according to claim 1, wherein the at least one data point is obtained from an image generated by a camera. (in [0198] In at least one embodiment, vehicle 1100 may further include any number of camera types, including stereo camera(s) 1168, wide-view camera(s) 1170, infrared camera(s) 1172, surround camera(s) 1174, long-range camera(s) 1198, mid-range camera(s) 1176, and/or other camera types. In at least one embodiment, cameras may be used to capture image data around an entire periphery of vehicle 1100…)

Regarding claim 5, the rejection of claim 1 is incorporated and Xu in combination with Song and Berry teaches the method according to claim 1, wherein each said training example further comprises a true value for a property to be predicted by the machine learning model. (in [0075] FIG. 6 illustrates an example of a process 600 for training a neural network based on uniqueness of the training data, according to at least one embodiment…. In at least one embodiment, process 600 is performed by one or more circuits to identify one or more images used to train one or more neural networks based, at least in part, on one or more labels of one or more objects within said one or more images [wherein each said training example further comprises a true value for a property to be predicted by the machine learning model].… )
Additionally Song teaches , in [0021] By training the machine learning models in a manner that optimizes model parameters and data augmentation policy parameters jointly, a system disclosed in this specification can train the machine learning model to generate outputs, e.g., perception outputs such as object detection or classification outputs [wherein each said training example further comprises a true value for a property to be predicted by the machine learning model], that are more accurate than those generated by models trained using conventional techniques, e.g., using manually designed data augmentation policies… Compared with other conventional approaches, the system can thus make more efficient use of computational resources, e.g., memory, wall clock time, or both during training. The system can also train the machine learning model using orders of magnitude smaller amount of labeled data [wherein each said training example further comprises a true value for a property to be predicted by the machine learning model]…
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Song and Xu for the same reasons disclosed above.


Claims  1-3, 5, 10-12, 14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 20250217446, hereinafter ‘Xu’) in view of Song et al. (US 20210284184, hereinafter ‘Song’) in further view  of Appalaraju et al. (US 10467526, hereinafter ‘Appa’).

Regarding independent claim 1, Appa teaches a method for training a machine learning model, comprising: obtaining, by a computing device, a training dataset comprising a collection of training examples, each training example comprising at least one data point; (in Fig.1 and Fig. 11D; And in [0059] In at least one embodiment, parameters used to train neural networks can be efficiently determined by first constructing a proxy dataset, which is representative of a full training dataset yet relatively much smaller in size. Proxy dataset may also be referred to herein as proxy data, a portion of training data, or a subset of training data [obtaining, by a computing device, a training dataset comprising a collection of training examples, each training example comprising at least one data point]. In at least one embodiment, parameters used to train neural networks can be estimated using proxy dataset and one or more proxy models, which are essentially smaller networks, yet representative of a larger/complete network structure. In at least one embodiment, utilizing both proxy dataset and proxy networks, computational burden of AutoML can be drastically reduced (e.g., from a few days to a few hours) and yet be able to estimate parameters that can lead to improved performance…)
determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples; (in [0066] FIG. 2 illustrates an example framework 200 for selecting a subset of training data to train a neural network based on uniqueness of training data [determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples], according to at least one embodiment. Top row of framework 200 indicates methods (e.g., using all data or a random subset of data) that can be used to perform parameter estimation. In at least one embodiment, a processor comprising one or more circuits performs bottom row of framework 200 to train one or more neural networks based on uniqueness of data. In at least one embodiment, a processor comprising one or more circuits performs bottom row of framework 200 to estimate parameters to train one or more neural networks based on uniqueness of data (e.g., similarities among input training data) [an importance of each training example relative to other training examples in the collection of training examples]... And in  [0067] In at least one embodiment, a proxy data selection strategy is performed. In at least one embodiment, to select proxy data, dataset D containing a set of data points {x.sub.1, x.sub.2, . . . x.sub.n} is determined, where x.sub.i is a single data sample [determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function and an importance of each training example relative to other training examples in the collection of training examples]. Data points may also be referred to herein as a region of interest in a training image. In at least one embodiment, to estimate an importance of a single data point x.sub.i, each data point's utility is estimated in relation to other data points x.sub.j, resulting in a set of paired measures. In at least one embodiment, pairs for x.sub.1 would be {(x.sub.1, x.sub.1), (x.sub.1, x.sub.2), (x.sub.1, x.sub.3) . . . (x.sub.1, x.sub.n)}. In at least one embodiment, a mean of said measure is utilized as an indicator of an importance of each data point (e.g., an indicator representing similarity between each data point among other data points). In at least one embodiment, mutual information (MI), which is shown in Eq. 1 below, is measured on flattened vectors of 3D images and normalized local cross-correlation (NCC), which is shown as Eq. 2 below, in local window size of (9, 9, 9) for each pair of data (x.sub.i, x.sub.j) as different variants…. [0070] In at least one embodiment, comparing training images to determine similarities among said images focuses on a task-specific region of interest. In at least one embodiment, acquisition parameters (e.g., number of slices, resolution, etc.) for different 3D volume scans can vary. In at least one embodiment, when considering a pair (x.sub.i, x.sub.j), even if x.sub.i is re-sampled to x.sub.j image size, there is misalignment for a region of interest (ROI) (e.g., an organ to be annotated by a model). In at least one embodiment, a task-specific ROI is utilized by analyzing information from an existing label. In at least one embodiment, a selected volume is cropped using a ROI and re-sampled to a cubic patch size. In at least one embodiment, data points are ranked by importance and data points containing lowest mutual information or lowest correlation [determining, by the computing device, for each training example of the training dataset, a derivative vector of a loss function…] are selected within a given budget B (e.g., amount of computing resources available).)
defining, by the computing device, a plurality of training subsets from the collection of training examples based on at least one of the of the loss functions or the importance of each training example relative to other training examples, wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training examples; (in [0079] In at least one embodiment, a system performing at least a part of process 700 includes executable code to identify 702 an object from one or more training images used to train one or more neural networks. In at least one embodiment, hyper-parameters can be estimated for training a neural network by using a subset [wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training example] of the training images that more accurately reflects a full dataset of training images [defining, by the computing device, a plurality of training subsets from the collection of training examples based on at least one of the derivative vectors of the loss functions or the importance of each training example relative to other training examples]. In at least one embodiment, a system performing at least a part of process 700 includes executable code to first find a region of interest in each training image, such as a bounding box around a particular object to be recognized by a neural network being trained. And in [0064] In at least one embodiment, GPU 106 processes training data 102 (comprising training images) to estimate parameters (e.g., hyper-parameters, values) that are used to control how a neural network is to be trained. In at least one embodiment, GPU processes training data 102 to estimate parameters prior to a neural network to be trained 116 uses training data for training… In at least one embodiment, GPU 106 determines a region of interest in each training image and calculates a score 108. In at least one embodiment, a region of interest in a training image is determined using a bounding box around a particular object that is to be recognized by a neural network 116 being trained…. In at least one embodiment, GPU 106 calculates a score (e.g., similarity score, importance score, indicator), such as a value of 1 (out of 10), and assigns said score to a region of interest in a first image based on said comparison. Assigning scores on a scale of 0 to 10 is just one example embodiment; however, scoring methods may vary (e.g., assigning scores between 1 to 100) and are not limited to a score based on a scale of 0 to 10. In at least one embodiment, said score is stored and added to a list (e.g., table or index)… In at least one embodiment, said score and training image would then be stored with a lower ranking than other scores in said list. In at least one embodiment, if a region of interest is more dissimilar to other regions of interest of in corresponding images [wherein total number of training examples in at least one training subset is different from that of another training subset, each training subset includes multiple training examples based on examples in corresponding region of interests], a higher importance score will be assigned because dissimilar images are not redundant to other images. Storing scores in a list based on comparisons between regions of interests and ranking scores higher based on dissimilarity between corresponding regions of interests is one example embodiment; however, other ways to store and rank scores (e.g., ranking scores that have greater similarity higher instead of lower) can also be utilized… [0067] In at least one embodiment, a proxy data selection strategy is performed. In at least one embodiment, to select proxy data, dataset D containing a set of data points {x.sub.1, x.sub.2, . . . x.sub.n} is determined, where x.sub.i is a single data sample. Data points may also be referred to herein as a region of interest in a training image…. )
and executing,  the computing device, at least two epoch of training of the machine learning model; using, for each epoch of training, one training subset among the plurality of training subsets, …  generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model; ([0079] In at least one embodiment, a system performing at least a part of process 700 includes executable code to identify 702 an object from one or more training images used to train one or more neural networks [and executing,  the computing device, at least two epoch of training of the machine learning model; using, for each epoch of training, one training subset among the plurality of training subsets, … ]. In at least one embodiment, hyper-parameters can be estimated for training a neural network by using a subset of the training images that more accurately reflects a full dataset of training images [generating, by the machine learning model, a plurality of feature embeddings using the one training subset,]. In at least one embodiment, a system performing at least a part of process 700 includes executable code to first find a region of interest in each training image, such as a bounding box around a particular object to be recognized by a neural network being trained. In at least one embodiment, a system performing at least a part of process 700 includes executable code to compare 704, for each image, a region of interest with a corresponding region of interest of every other interest. In at least one embodiment, object in region of interest is compared with objects in corresponding regions of interest among said one or more training images; And in [0127] In at least one embodiment, one or more systems depicted in FIG. 11A are utilized to implement a framework for training one or more neural networks based on uniqueness of training data. In at least one embodiment, one or more systems depicted in FIG. 11A are utilized to perform various processes such as those described in connection with FIGS. 1-7. In at least one embodiment, one or more systems depicted in FIG. 11A are utilized to use a subset of training data, such as training images, based on uniqueness of training data to estimate parameters as part of one or more training process of a neural network model [generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model]. )
and outputting, by the computing device, the machine learning model as a trained model to be employed in a vehicle. (in [0220] In at least one embodiment, server(s) 1178 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation (e.g., using a game engine). In at least one embodiment, any amount of training data is tagged (e.g., where associated neural network benefits from supervised learning) and/or undergoes other pre-processing. In at least one embodiment, any amount of training data is not tagged and/or pre-processed (e.g., where associated neural network does not require supervised learning). In at least one embodiment, once machine learning models are trained, machine learning models may be used by vehicles [and outputting, by the computing device, the machine learning model as a trained model to be employed in a vehicle.] (e.g., transmitted to vehicles over network(s) 1190), and/or machine learning models may be used by server(s) 1178 to remotely monitor vehicles.)

While Xu teaches training a neural network on a subset of data based on a feature selection process, Xu does not expressly teach the iterative training process claimed in the limitation using, for each epoch of training, one training subset among the plurality of training subsets, the training subset being employed for each epoch is a different training subset, and for each epoch of training:
Song does expressly teach the iterative training process claimed in the limitation using, for each epoch of training, one training subset among the plurality of training subsets, the training subset being employed for each epoch is a different training subset, and for each epoch of training: (in [0120] In one example, the machine learning model is a neural network model and the system trains the neural network model over multiple training iterations. At each training iteration [using, for each epoch of training, one training subset among the plurality of training subsets, the training subset being employed for each epoch is a different training subset, and for each epoch of training], the system selects a current mini-batch of one or more training examples from the candidate training data [one training subset among the plurality of training subsets], and then determines an “augmented” mini-batch of training examples by transforming the training inputs in the current mini-batch of training examples using the current point cloud augmentation policy. Optionally, the system may adjust the target outputs in the current mini-batch of training examples to account for the transformations applied to the training inputs (as described earlier). The system processes the transformed training inputs in accordance with the current parameter values of the machine learning model to generate corresponding outputs. The system then determines gradients of an objective function that measures a similarity between: (i) the outputs generated by the machine learning model, and (ii) the target outputs specified by the training examples, and uses the gradients to adjust the current values of the machine learning model parameters [generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model]. The system may determine the gradients using, e.g., a backpropagation procedure, and the system may use the gradients to adjust the current values of the machine learning model parameters using any appropriate gradient descent optimization procedure, e.g., an RMSprop or Adam procedure…)
Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing and retrieving information for training a machine learning model having a plurality of model parameters to perform a particular neural network task using training data subsets, as disclosed by Song with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as disclosed by Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Song and Xu for the benefit  of developing and implementing progressive population based training that allows the system to make more efficient use of computational resources, e.g., memory, wall clock time, or both during training, (Song, 0073).
While Xu and Song teaches the process for training a neural network using backpropagation, one of ordinary skill in the art would understand that the process of  training and adjusting the parameters of a neural network during training involves measuring and reducing a distance among similar feature embeddings. 
Appa does expressly teaches training and adjusting the parameters of a neural network during training involves measuring and reducing a distance among similar feature embeddings, in 11:18-31: In equation (1), Y is the image pair label, set to 0 for positive or similar pairs, and set to 1 for negative or dissimilar pairs (note that the opposite labels, 1 for positive and 0 for negative, may be employed in some embodiments, and the corresponding summation terms of equation (1) may be modified accordingly). D(O.sub.i, O.sub.j) represents a distance metric (e.g., Euclidean, Manhattan or cosine distance) between two output vectors O.sub.i and O.sub.j, and m is a margin hyperparameter of the model [generating, by the machine learning model, a plurality of feature embeddings using the one training subset, and reducing, by the computing device, distance between feature embeddings to train the machine learning model]. When a positive input pair is processed, Y is zero, so the second additive term involving the maximum is also zero, and the loss becomes the distance between the embeddings of two similar images. As such, the model learns to reduce the distance between similar images [reducing, by the computing device, distance between feature embeddings to train the machine learning model], which is a desirable result…; And in 3:42-4:3: …  Accordingly, in various embodiments, a number of techniques may be used to optimize the selection and ordering of training input examples to be provided to the model. Such techniques may be based at least in part on the concept of curriculum learning, according to which (at a high level) the degree of difficulty with respect to the prediction or result being generated for a given training example is gradually increased during the course of training. As discussed below in further detail, curriculum learning-inspired techniques may be implemented at several different levels of granularity in some embodiments—e.g., at both intra-epoch granularity (when selecting and ordering training examples of a mini-batch) as well as inter-epoch granularity (using parameters which depend on the number of epochs that have been completed to perform weighted sampling of image pairs) [generating, by the machine learning model, a plurality of feature embeddings using the one training subset,…]…
Additionally, Appa teaches in 20:50-21:16: A subsequent version of the model may then be trained using the optimized training image pairs (element 916). In at least some embodiments, after each iteration of training or after each mini-batch, the model may be checkpointed or saved, and such saved versions may be used to select image pairs for further iterations [executing,  the computing device, at least two epoch of training of the machine learning model; using, for each epoch of training, one training subset among the plurality of training subsets, …  generating, by the machine learning model, a plurality of feature embeddings using the one training subset,..]. Note that model versions need not necessarily be saved after each min-batch or after each epoch in some embodiments, instead, for example, model versions may be saved after some number of mini-batches or epochs. In such embodiments, several sets of optimized training image pairs may be generated using the same saved version of the model, and then used for training respective newer versions of the model. If, after a given iteration or mini-batch, the model has converged or reached a desired quality of results, as determined in operations corresponding to element 919, the fully-trained version of the model may be saved. The trained model may then be deployed to a run-time execution environment (element 922)… If the model has not converged (as also detected in operations corresponding to element 919), the operations corresponding to elements 910, 913, 916 and 919 for the next round of training image pair selection followed by model training may be performed in the depicted embodiment [executing,  the computing device, at least two epoch of training of the machine learning model; using, for each epoch of training, one training subset among the plurality of training subsets, …  generating, by the machine learning model, a plurality of feature embeddings using the one training subset,..]…
Appa, Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing and retrieving information for training an artificial intelligence system, as disclosed by Appa with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as collectively disclosed by Song and Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Appa, Song and Xu for the benefit of improving the quality of selected training image subset for iterative training of machine learning models (Appa, Abstract and 14:4-30).

Regarding claim 2, the rejection of claim 1 is incorporated and Xu in combination with Song and Appa teaches the method according to claim 1, further comprising using the trained model to control operations of a mobile platform, as the vehicle. (in [0220] In at least one embodiment, server(s) 1178 may be used to train machine learning models (e.g., neural networks) based at least in part on training data. In at least one embodiment, training data may be generated by vehicles, and/or may be generated in a simulation (e.g., using a game engine). In at least one embodiment, any amount of training data is tagged (e.g., where associated neural network benefits from supervised learning) and/or undergoes other pre-processing. In at least one embodiment, any amount of training data is not tagged and/or pre-processed (e.g., where associated neural network does not require supervised learning). In at least one embodiment, once machine learning models are trained, machine learning models may be used by vehicles [further comprising using the trained model to control operations of a mobile platform, as the vehicle] (e.g., transmitted to vehicles over network(s) 1190), and/or machine learning models may be used by server(s) 1178 to remotely monitor vehicles.; And in [0455] In at least one embodiment, one or more PPUs 3300 are configured to accelerate High Performance Computing (“HPC”), data center, and machine learning applications. In at least one embodiment, PPU 3300 is configured to accelerate deep learning systems and applications including following non-limiting examples: autonomous vehicle platforms,...)

Regarding claim 3, the rejection of claim 1 is incorporated and Xu in combination with Song and Appa teaches the method according to claim 1, wherein the at least one data point is obtained from an image generated by a camera. (in [0198] In at least one embodiment, vehicle 1100 may further include any number of camera types, including stereo camera(s) 1168, wide-view camera(s) 1170, infrared camera(s) 1172, surround camera(s) 1174, long-range camera(s) 1198, mid-range camera(s) 1176, and/or other camera types. In at least one embodiment, cameras may be used to capture image data around an entire periphery of vehicle 1100…)

Regarding claim 5, the rejection of claim 1 is incorporated and Xu in combination with Song and Appa teaches the method according to claim 1, wherein each said training example further comprises a true value for a property to be predicted by the machine learning model. (in [0075] FIG. 6 illustrates an example of a process 600 for training a neural network based on uniqueness of the training data, according to at least one embodiment…. In at least one embodiment, process 600 is performed by one or more circuits to identify one or more images used to train one or more neural networks based, at least in part, on one or more labels of one or more objects within said one or more images [wherein each said training example further comprises a true value for a property to be predicted by the machine learning model].… )
Additionally Song teaches , in [0021] By training the machine learning models in a manner that optimizes model parameters and data augmentation policy parameters jointly, a system disclosed in this specification can train the machine learning model to generate outputs, e.g., perception outputs such as object detection or classification outputs [wherein each said training example further comprises a true value for a property to be predicted by the machine learning model], that are more accurate than those generated by models trained using conventional techniques, e.g., using manually designed data augmentation policies… Compared with other conventional approaches, the system can thus make more efficient use of computational resources, e.g., memory, wall clock time, or both during training. The system can also train the machine learning model using orders of magnitude smaller amount of labeled data [wherein each said training example further comprises a true value for a property to be predicted by the machine learning model]…
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Song and Xu for the same reasons disclosed above.
Regarding claims 10-12 and 14, the limitations are similar to those in claims 1-3 and 5 respectively and are thus rejected under the same rationale. 
Regarding claims 19 and 20, the limitations are simar to those in claims 1 and 2 respectively and are rejected under the same rationale.
 
Claims 4, 13, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 20250217446, hereinafter ‘Xu’) in view of Song et al. (US 20210284184, hereinafter ‘Song’) in further view of Berry (US 20200364507) and Gu (US 20210056715).

Regarding claim 4, the rejection of claim 1 is incorporated and Xu in combination with Song and Berry teaches the method according to claim 1, further comprising determining. (in [0086] In at least one embodiment, inference and/or training logic 815 may include, without limitation, a code and/or data storage 805 to store backward and/or output weight and/or input/output data corresponding to neurons or layers of a neural network trained and/or used for inferencing in aspects of one or more embodiments. In at least one embodiment, code and/or data storage 805 stores weight parameters and/or input/output data of each layer of a neural network trained or used in conjunction with one or more embodiments during backward propagation [further comprising determining] of input/output data and/or weight parameters during training and/or inferencing using aspects of one or more embodiments….)
While Xu teaches defining the training data set as training parameters for training the neural network using backpropagation.
Xu, Song, and Berry, do not expressly disclose the use of a normalization processing for processing training parameters. 
Gu does expressly teach the use of a normalization processing for processing training parameters, in [0053] The batch norm [further comprising determining norms of the derivative vectors, wherein the plurality of training subsets are defined based on the norms of the derivative vectors] calculation helps to reduce difference in value ranges between different samples, so that most of data is in an unsaturated region, thereby ensuring better back-propagation of gradient [further comprising determining norms of the derivative vectors, wherein the plurality of training subsets are defined based on the norms of the derivative vectors], so as to accelerate the convergence of the network. Next, the fully connected layers reassemble local features extracted during the batch norm calculation into a complete feature through the weight matrix.
Gu, Berry, Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing and retrieving information for training a machine learning model using batch norm calculation for processing exacted data features, as disclosed by Gu with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as collectively disclosed by Berry, Song and Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Gu, Berry, Song and Xu for to accelerate the convergence of the neural network  during training using back-propagation of gradient ,(Gu, [0053]).
Regarding claims 13 and 21, the claims are similar to claim 1 and rejected under the same rationale.

Claims 9, 18, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 20250217446, hereinafter ‘Xu’) in view of Song et al. (US 20210284184, hereinafter ‘Song’) in further view of Berry (US 20200364507), Gu (US 20210056715), and Xie et al. (US 20210158073, hereinafter ‘Xie’).
Regarding claim 9, the rejection of claim 4 is incorporated and Xu in combination with Song and Berry teaches the method according to claim 4, further comprising ranking the plurality of training examples in accordance with the norms of the derivative vectors of the loss function, wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors. (in [0086] In at least one embodiment, inference and/or training logic 815 may include, without limitation, a code and/or data storage 805 to store backward and/or output weight and/or input/output data corresponding to neurons or layers of a neural network trained and/or used for inferencing in aspects of one or more embodiments. In at least one embodiment, code and/or data storage 805 stores weight parameters and/or input/output data of each layer of a neural network trained or used in conjunction with one or more embodiments during backward propagation [ranking the plurality of training examples in accordance with the norms of the derivative vectors of the loss function, wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors] of input/output data and/or weight parameters during training and/or inferencing using aspects of one or more embodiments….; And in [0077] In at least one embodiment, a system performing at least a part of process 600 includes executable code to select 604 a portion of training data to train one or more neural networks based on uniqueness of training data… In at least one embodiment, as described above, a similarity score uses a mutual information equation (shown as Eq. 1 above) that calculates scores for how similar one object in an image is to said object in another image. In at least one embodiment, similarity scores are stored in a list. In at least one embodiment, portion of training data is selected based on said list [wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors]…)
While Xu teaches defining the training data set as training parameters for training the neural network using backpropagation.
Xu, Song, and Berry, do not expressly disclose the use of a normalization processing for processing training parameters. 
Gu does expressly teach the use of a normalization processing for processing training parameters, in [0053] The batch norm [further comprising determining norms of the derivative vectors, wherein the plurality of training subsets are defined based on the norms of the derivative vectors] calculation helps to reduce difference in value ranges between different samples, so that most of data is in an unsaturated region, thereby ensuring better back-propagation of gradient [ wherein the plurality of training subsets are defined based on the ], so as to accelerate the convergence of the network. Next, the fully connected layers reassemble local features extracted during the batch norm calculation into a complete feature through the weight matrix.
Gu does not expressly disclose ranking features for feature selection activity.
Xie does expressly disclose ranking features for feature selection activity, in [0025] After the features and other data 212 have been linked, these can be batched and transmitted to the training system 216 which can use this in conjunction with external (or internal) labels and models to determine which of the features input are the optimal…. To identify the optimal features to use a feature selection score may be determined and used in the identification. Note that other techniques may be used, and a feature selection score is, but an exemplary technique presented. Further to the feature selection score, a ranking strategy or other method may be applied to select those features with the higher or targeted scores. Additionally or alternatively, the feature orchestrator 206 and/or the request can include the ranking strategy [ranking the plurality of training examples in accordance with the norms of the derivative vectors of the loss function, wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors] (e.g. sorting, normalization and weighting, Eucledian distance) to be used and/or features from which to select, which in turn provide an output indicating the best feature processes to use. As the features and corresponding feature processes 204 are identified, the end-to-end auto-determining feature system may be re-processed (or as another request comes in of a same type) features re-analyzed until the model performance converges.
Xie, Gu, Berry, Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for learning and training the machine learning models, characteristics or features associated with the data are identified and used for pattern recognition and classification, as disclosed by Xie with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as collectively disclosed by Gu, Berry, Song and Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Xie, Gu, Berry, Song and Xu for to allow for implementing feature selection and modeling techniques for identifying features in a more reliable and dynamic manner, (Xie, [0002]).
Regarding claims 18 and 22, the claims are similar to claim 1 and rejected under the same rationale.

Claims 4, 9, 13, 18, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (US 20250217446, hereinafter ‘Xu’) in view of Song et al. (US 20210284184, hereinafter ‘Song’) in further view of Berry and Gallafent et al.  (US 20070031039, hereinafter ‘Gal’).

Regarding claim 4, the rejection of claim 1 is incorporated and Xu in combination with Song and Berry teaches the method according to claim 1, further comprising determining. (in [0086] In at least one embodiment, inference and/or training logic 815 may include, without limitation, a code and/or data storage 805 to store backward and/or output weight and/or input/output data corresponding to neurons or layers of a neural network trained and/or used for inferencing in aspects of one or more embodiments. In at least one embodiment, code and/or data storage 805 stores weight parameters and/or input/output data of each layer of a neural network trained or used in conjunction with one or more embodiments during backward propagation [further comprising determining] of input/output data and/or weight parameters during training and/or inferencing using aspects of one or more embodiments….)
While Xu teaches defining the training data set as training parameters for training the neural network using backpropagation.
Xu, Song, and Berry, do not expressly disclose the use of a normalization processing for processing training parameters. 
Gal does expressly teach the use of a normalization processing for processing training parameters, in [0019] The gradient data used in the method is calculated using the values of the parameters for each pixel. First, one or more gradient values are calculated for each pixel. In one embodiment, for each pixel (in a two-dimensional pixel array) a value representing the derivative of one of the parameters in each of the horizontal and vertical directions is calculated. These two derivative values may be though of as the components of a two-dimensional vector, which may be referred to as a derivative vector. Then, the magnitude of the derivative vector is calculated [further comprising determining norms of the derivative vectors, wherein the plurality of training subsets are defined based on the norms of the derivative vectors]. For example, if the components of the derivative vector have values x and y, then the magnitude may be calculated to be m.sub.1=(x.sup.2+y.sup.y).sup.1/2. Similar magnitude values m.sub.2, m.sub.3, . . . etc. may be calculated for each pixel based on the derivative of the other parameters. Next, a single derivative parameter D.sub.1 for each pixel may be calculated based on the individual derivative values m.sub.1, m.sub.2, . .  For example, in one example, the value of D.sub.1 is the average value (such as mean or median value) of the values m.sub.1, m.sub.2, . . . Alternatively, the value of M may be determined to be the magnitude of the vector whose components are the values m.sub.1, m.sub.2, . . . so that D.sub.1=(m.sub.1.sup.2+m.sub.2.sup.2+. . .).sup.1/2. In this way, an additional parameter D, is defined representing the variation of the visual characteristics of pixels at a particular pixel. A value for the parameter D.sub.1 may be calculated for each pixel.
Gal, Berry, Song and Xu are analogous art because both involve developing information retrieval and modeling techniques using machine learning systems and algorithms.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for processing and retrieving information for training a machine learning model image segmentation algorithm, as disclosed by Gal with the method of developing information retrieval and modeling techniques using machine learning systems and algorithms as collectively disclosed by Berry, Song and Xu.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Gal, Berry, Song and Xu in order to generate an image mask or perform other image processing, the image may be segmented to divide the image into regions, (Gal, [0005]).

Regarding claim 9, the rejection of claim 4 is incorporated and Xu in combination with Song, Berry, and Gal teaches the method according to claim 4, further comprising ranking the plurality of training examples in accordance with the norms of the derivative vectors of the loss function, wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors. (in [0086] In at least one embodiment, inference and/or training logic 815 may include, without limitation, a code and/or data storage 805 to store backward and/or output weight and/or input/output data corresponding to neurons or layers of a neural network trained and/or used for inferencing in aspects of one or more embodiments. In at least one embodiment, code and/or data storage 805 stores weight parameters and/or input/output data of each layer of a neural network trained or used in conjunction with one or more embodiments during backward propagation [ranking the plurality of training examples in accordance with the norms of the derivative vectors of the loss function, wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors] of input/output data and/or weight parameters during training and/or inferencing using aspects of one or more embodiments….; And in [0077] In at least one embodiment, a system performing at least a part of process 600 includes executable code to select 604 a portion of training data to train one or more neural networks based on uniqueness of training data… In at least one embodiment, as described above, a similarity score uses a mutual information equation (shown as Eq. 1 above) that calculates scores for how similar one object in an image is to said object in another image. In at least one embodiment, similarity scores are stored in a list. In at least one embodiment, portion of training data is selected based on said list [wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors]…)
While Xu teaches defining the training data set as training parameters for training the neural network using backpropagation.
Xu, Song, and Berry, do not expressly disclose the use of a normalization processing for processing training parameters. 
Gal does expressly teach the use of a normalization processing for processing training parameters, in [0019] The gradient data used in the method is calculated using the values of the parameters for each pixel. First, one or more gradient values are calculated for each pixel. In one embodiment, for each pixel (in a two-dimensional pixel array) a value representing the derivative of one of the parameters in each of the horizontal and vertical directions is calculated. These two derivative values may be though of as the components of a two-dimensional vector, which may be referred to as a derivative vector. Then, the magnitude of the derivative vector is calculated [further comprising determining norms of the derivative vectors, wherein the plurality of training subsets are defined based on the norms of the derivative vectors]. For example, if the components of the derivative vector have values x and y, then the magnitude may be calculated to be m.sub.1=(x.sup.2+y.sup.y).sup.1/2. Similar magnitude values m.sub.2, m.sub.3, . . . etc. may be calculated for each pixel based on the derivative of the other parameters. Next, a single derivative parameter D.sub.1 for each pixel may be calculated based on the individual derivative values m.sub.1, m.sub.2, . .  For example, in one example, the value of D.sub.1 is the average value (such as mean or median value) of the values m.sub.1, m.sub.2, . . . Alternatively, the value of M may be determined to be the magnitude of the vector whose components are the values m.sub.1, m.sub.2, . . . so that D.sub.1=(m.sub.1.sup.2+m.sub.2.sup.2+. . .).sup.1/2. In this way, an additional parameter D, is defined representing the variation of the visual characteristics of pixels at a particular pixel. A value for the parameter D.sub.1 may be calculated for each pixel.
Gal does expressly disclose ranking features for feature selection activity, in 0070: The multi-scale gradient technique allows a determination of the "feature size" by calculating several gradients for each pixel. First, the calculation of the first derivative/gradient. In this example the gradient does not have a direction, only the magnitude of the gradient at a point is calculated. For a greyscale image this is a scalar value, and for a multi-channel image it is, in general, a vector in colour/feature space. For a multi-channel image, the gradient magnitude may be calculated for each channel, and then these values combined to produce a single scalar value by various methods, e.g. the magnitude of the vector constructed in colour space by using the min and max in each channel as the endpoints [ranking the plurality of training examples in accordance with the norms of the derivative vectors of the loss function, wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors] in that dimension, or the maximum (or median, or mean) of the gradient-magnitude in each channel [wherein the plurality of training subsets are defined based on the ranking of the norms of the derivative vectors]. [0071] One example is to take a processing element (for example, a square), and to define the gradient at a pixel as (max-min)/l, where max is the highest grey level in the square, min is the lowest, and l is the side length of the square, when the square is centred on the pixel in question. And in [0017] In an exemplary system and method according to the invention, a digital image is segmented. The process uses two types of data. The first type comprises one or more selections of pixels made in the image, either by a user or automatically by the system. The second type of data includes data derived automatically from the image by the system representing the gradient or derivative at each pixel of the values representing the visual characteristics (such of colour) of each pixel. These two kinds of data are applied to generate a segmentation of the image.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Gal, Berry, Song and Xu for the same reasons disclosed above.
Regarding claims 13 and 18, the claims are similar to claims 4 and 9 respectively and are thus rejected under the same rationale.
Regarding claims 21 and 22, the claims are similar to claims 4 and 9 respectively and are thus rejected under the same rationale.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Cheng et al. (US 20210142210) teaches in [0057] Feature embedding generally refers to translating a data set into a dimensional space of reduced dimensionality so as to increase, or maximize, distances between data points (such as individual images) which need to be distinguished in computing a task for a particular function, and decrease, or minimize, distances between data points which need to be matched, clustered, or otherwise found similar in computing a task for a particular function. For example, functions for expressing distance between two data points may be any function which expresses Euclidean distance, such as L2-norm; Manhattan distance; any function which expresses cosine distance, such as the negative of cosine similarity; or any other suitable distance function as known to persons skilled in the art. According to example embodiments of the present disclosure, a distance function evaluating two data points x and y may be written as D(x, y).
Wekel et al. (US 20220277193): teaches using Deep Neural Networks (DNNs) to perform LiDAR and camera perception. Classes of such DNNs include DNNs that perform panoptic segmentation of camera images in perspective view, and DNNs that perform top-down or “Bird's Eye View” (BEV) object detection from LiDAR point clouds for use as autonomous or semi-autonomous vehicle perception networks.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN ALABI/              Primary Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Sep 02, 2021
Application Filed
May 30, 2025
Non-Final Rejection — §103, §112
Oct 03, 2025
Response Filed
Jan 22, 2026
Final Rejection — §103, §112
Apr 30, 2026
Examiner Interview Summary
Apr 30, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/093,594
Patent 12579409
IDENTIFYING SENSOR DRIFTS AND DIVERSE VARYING OPERATIONAL CONDITIONS USING VARIATIONAL AUTOENCODERS FOR CONTINUAL TRAINING
3y 2m to grant Granted Mar 17, 2026
18/802,747
Patent 12572814
ARTIFICIAL NEURAL NETWORK BASED SEARCH ENGINE CIRCUITRY
1y 6m to grant Granted Mar 10, 2026
18/196,986
Patent 12561570
METHODS AND ARRANGEMENTS TO IDENTIFY FEATURE CONTRIBUTIONS TO ERRONEOUS PREDICTIONS
2y 9m to grant Granted Feb 24, 2026
17/410,689
Patent 12547890
AUTOREGRESSIVELY GENERATING SEQUENCES OF DATA ELEMENTS DEFINING ACTIONS TO BE PERFORMED BY AN AGENT
4y 5m to grant Granted Feb 10, 2026
18/399,358
Patent 12536478
TRAINING DISTILLED MACHINE LEARNING MODELS
2y 1m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
82%
With Interview (+22.8%)
3y 10m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 205 resolved cases by this examiner. Grant probability derived from career allowance rate.