Last updated: May 29, 2026
Application No. 17/177,362
Predicting Machine-Learned Model Performance from the Parameter Values of the Model

Final Rejection §103
Filed
Feb 17, 2021
Priority
Feb 19, 2020 — provisional 62/978,706
Examiner
VAUGHN, RYAN C
Art Unit
2125
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
4 (Final)
Interview Optional

— +19.8% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 62% grant rate with +19.8% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 241 resolved cases, 2023–2026
Examiner Intelligence

VAUGHN, RYAN C View full profile →
Grants 62% of resolved cases
Career Allowance Rate
149 granted / 241 resolved
+6.8% vs TC avg
Strong +20% interview lift
Without
With
+19.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
29 currently pending
Career history
283
Total Applications
across all art units
Statute-Specific Performance

§101
18.1%
-21.9% vs TC avg
§103
60.2%
+20.2% vs TC avg
§102
2.7%
-37.3% vs TC avg
§112
11.9%
-28.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 241 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-14 and 16-20 are presented for examination.

Response to Amendment
Applicant’s amendment has obviated the specification objections and the rejections under 35 USC § 101.  Therefore, those objections and rejections are withdrawn.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-6, 14, 16-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Yang (US 20210073995 A1) (“Yang”) in view of Nair et al. (US 20200372342 A1) (“Nair”) and further in view of Faivishevsky et al. (US 20180240010 A1) (“Faivishevsky”).
Regarding claim 1, Yang teaches “[a] computing system to predict a performance value of a machine-learned model based on its parameter values to enable early stopping of model training, the computing system comprising: 
one or more processors (said general purpose processor, see [0081]); and 
one or more non-transitory computer-readable media (one or more non-transitory computer-readable storage media, see [0471]) that collectively store: 
a machine-learned performance prediction model (second network, see [0078]) that has been trained … to predict performance values of machine-learned models based on their parameter values (trained using data that is processed in accordance with augmentation parameters in said set of hyperparameters [..] a training-quality measure is generated for said trained network, see [0078]); and
 instructions that, when executed by the one or more processors, cause the computing system to perform operations (Computer system includes memory storing executable instructions that, as a result of being executed by said general purpose processor […] cause said computer system to perform operations described herein, see [0081]), the operations comprising: 
obtaining data descriptive of a plurality of parameter values of a plurality of … parameters [of] an assessed machine-learned model (Computer system generates, using a first network, a set of hyperparameters for a second network, see [0078]. The assessed machine-learned model is mapped to the first network. The set of hyperparameters may include data augmentation parameters such as image contrast, signal filtering, and sampling rate parameters, see [0078], including weight parameters based, at least in part, on one or more hyperparameter, see [Claim 1]); 
inputting, as input to an input layer of the machine-learned performance prediction model, the data descriptive of the plurality of parameter values of the plurality of …parameters [of] the assessed machine-learned model into the machine-learned performance prediction model (hyperparameters generated by first network are applied to said second network [i.e., they are used by the entire model, including the input layer], see [0078]. The machine-learned performance prediction model is mapped to the second network); 
receiving, as output from the machine-learned performance prediction model, a predicted performance value (a training-quality measure is generated for said trained network, see [0078]) of the assessed machine-learned model based on the data descriptive of the plurality of parameter values of the plurality of … parameters [of] the assessed machine-learned model (In at least one embodiment, said second network is trained using data that is processed in accordance with augmentation parameters in said set of hyperparameters, see [0078]); […]
Yang does not explicitly disclose the further limitations of the claim.  However, Nair discloses “a plurality of weight parameters used by an assessed machine-learned model (in some neural networks, during backpropagation, each neuron computes its own gradient for a link for the neuron, the gradient to be applied to adjust the weight of the link [i.e., the weight parameters are used by the model] – Nair, paragraph 54) ….”
Nair further discloses “stopping early, by the computing system, the training of the assessed machine-learned model based at least in part on the predicted performance value of the assessed machine-learned model (Embodiments of the invention include systems and methods that may stop training of a NN if it is determined that likelihood of improvement in loss over a best loss seen [predicted performance value] is below a threshold, or the training has proceeded to the mean expected or final value of training (typically after a certain number of initial training cycles or epochs), or if training has “stalled” and does not improve (e.g. over a historic best value) over a certain number of training cycles or epochs. In some embodiments, a training module or unit may call an early stopping module or unit (e.g. as described in the examples in Tables 1-5 herein) and receive instructions to stop training or not stop training, see [0032]).”
Nair and the instant application are analogous art because they are in the same field of endeavor of neural networks.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang to include an early stopping algorithm as disclosed in Nair, as Nair discloses that “training NN often involves a diminishing amount of improvement over time, and it is desirable to determine at what point training can be stopped if an acceptable amount of improvement is achieved, without wasting time on further training due to the law of diminishing returns”, see Nair [0013].
Neither Yang nor Nair appears to disclose explicitly the further limitations of the claim.  However, Faivishevsky discloses “train[ing] based on a training dataset correlating a plurality of respective … parameter description values to a plurality of corresponding performance values (The network trainer 202 may be further configured to train the machine learning network with multiple, parallel instances of the training algorithm. Each parallel instance of the training algorithm is configured with a different set of one or more configuration parameters [parameter description values]. The network trainer 202 may be further configured to capture a time series of partial accuracy values [performance values] for each parallel instance of the training algorithm. [0017]) ….”
Faivishevsky and the instant application both relate to machine learning and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Yang and Nair to train with a dataset of parameter description values corresponding to performance values, as disclosed by Faivishevsky, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to keep track of the accuracy values of the model, thereby providing an indicator of when further training is necessary.  See Faivishevsky, paragraph [0017].

Regarding claim 2, Yang in view of Nair and Faivishevsky discloses all of the elements of claim 1 as shown in the rejection above.
Yang also discloses “said operations of obtaining (training-quality measure is used as a reward, see Yang [0078]), inputting (perform reinforcement learning, see Yang [0078]), generating, and providing (produce improved hyperparameters, see Yang [0078]) are performed during training of the assessed machine-learned model (FIG. 6 illustrates an example of a process that, as a result if being performed by a computer system, trains a network using hyperparameters generated using reinforcement learning, see Yang [0078]. Generate, using a first network, a set of hyper parameters for a second network [602], train the second network using the set of hyper parameters [604], determine a reward based on a quality of the training of the second network [606], is the reward sufficient [608], training is complete [610], see Yang [FIG. 6]).”

Regarding claim 3, Yang in view of Nair and Faivishevsky teaches all of the limitations of claim 1 as shown in the rejection above. 
Yang also discloses “the assessed machine-learned model comprises a neural network (a recurrent neural network that generates hyper parameters for a machine learning network, see [0065]); and
the plurality of parameter values comprise[s] a plurality of weight values (including weight parameters based, at least in part, on one or more hyperparameter, see [Claim 1]; weight parameters, see [0109]) respectively associated with the neural network (In at least one embodiment, code, such as graph code, loads weight or other parameter information into processor ALUs based on an architecture of a neural network to which the code corresponds, see [0086]).”

Regarding claim 4, Yang in view of Nair and Faivishevsky teaches all of the limitations of claim 1 as shown in the rejection above. 
Yang also discloses “inputting, as input to the input layer of the machine-learned performance prediction model, a plurality of hyperparameter values associated with a plurality of hyperparameters of the assessed machine-learned model (generates, using a first network, a set of hyperparameters, see [0078]; see also Fig. 7 (showing that the second network’s hyperparameters are input into the first network)).”

Regarding claim 5, Yang in view of Nair and Faivishevsky teaches all of the limitations of claim 1 as shown in the rejection above. 
Yang also discloses “wherein the performance value comprises a test accuracy value (training-quality measure is generated […] a training-quality measure is used as a reward, see [0078]; a reward is sufficient when it indicates that accuracy of output, see [0079]) relative to a defined testing dataset (training data, see [0078]).”

Regarding claim 6, Yang in view of Nair and Faivishevsky teaches all of the limitations of claim 1 as shown in the rejection above. 
Yang also discloses “wherein the test accuracy value comprises a future test accuracy value that assumes training of the assessed machine-learned model is completed to convergence (In at least one embodiment, at block 708, said computer system evaluates training of said first network to generate a reward based on quality of image segmentation produced by first network. In at least one embodiment, a reward is based on a convergence rate of training of said second network. In at least one embodiment, evaluation of a reward generated at block 708 determines whether training is complete, see [0082]).”

Regarding claim 14, Yang discloses “[a] computer-implemented method for predicting a performance value of a machine-learned model based on its parameter values to enable early stopping of model training, the method comprising: 
obtaining, by a computing system comprising one or more computing devices, a training dataset and an assessed machine-learned model to be trained on the training dataset (Computer system generates, using a first network, a set of hyperparameters, for a second network […] said set of hyperparameters may include structural parameters of said second network, see Yang [0078]); 
training, by the computing system, the assessed machine-learned model on the training dataset for a plurality of training iterations (FIG. 6 illustrates an example of a process that, as a result if being performed by a computer system, trains a network using hyperparameters generated using reinforcement learning, see Yang [0078]), wherein one or more of the plurality of training iterations comprises: 
obtaining, by the computing system, data descriptive of a plurality of current parameter values of a plurality of … parameters [of] a current instance of the assessed machine-learned model (Computer system generates, using a first network, a set of hyperparameters, for a second network, see Yang [0078]. The assessed machine-learned model is the first network, the machine-learned performance model is the second network); …
inputting, by the computing system, the data descriptive of the plurality of current parameter values into a machine-learned performance prediction model …, wherein inputting comprises inputting the data as input to an input layer of the machine-learned performance prediction model (hyperparameters generated by said first network are applied to said second network, and said second network is trained using training data, see Yang [0078]; see also Fig. 6 (showing that the second network is trained using the set of hyperparameters generated by the first network, i.e., the first network inputs the hyperparameters to all layers of the second network, including the input layer)); [and]
receiving, by the computing system as output from the machine-learned performance prediction model, a predicted performance value of the assessed machine-learned model based on the data descriptive of the plurality of current parameter values (A training quality measure is generated for said trained network; training quality measure is used as a reward to perform reinforcement learning on the first network [i.e., it is an output of the network], see Yang [0078]); […]”
Yang does not explicitly disclose the further limitations of the claim.  However, Nair discloses “a plurality of weight parameters used by a  current instance of the assessed machine-learned model (in some neural networks, during backpropagation, each neuron computes its own gradient for a link for the neuron, the gradient to be applied to adjust the weight of the link [i.e., the weight parameters are used by the model] – Nair, paragraph 54) ….”
Nair further discloses “stopping early, by the computing system, the training of the assessed machine-learned model based at least in part on the predicted performance value of the assessed machine-learned model (Embodiments of the invention include systems and methods that may stop training of a NN if it is determined that likelihood of improvement in loss over a best loss seen [predicted performance value] is below a threshold, or the training has proceeded to the mean expected or final value of training (typically after a certain number of initial training cycles or epochs), or if training has “stalled” and does not improve (e.g. over a historic best value) over a certain number of training cycles or epochs. In some embodiments, a training module or unit may call an early stopping module or unit (e.g. as described in the examples in Tables 1-5 herein) and receive instructions to stop training or not stop training, see [0032]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang to include an early stopping algorithm as disclosed in Nair, as Nair discloses that “training NN often involves a diminishing amount of improvement over time, and it is desirable to determine at what point training can be stopped if an acceptable amount of improvement is achieved, without wasting time on further training due to the law of diminishing returns”, see Nair [0013].
Neither Yang nor Nair appears to disclose explicitly the further limitations of the claim.  However, Faivishevsky discloses a “model that has been trained based on a training dataset correlating a plurality of respective … parameter description values to a plurality of corresponding performance values (The network trainer 202 may be further configured to train the machine learning network with multiple, parallel instances of the training algorithm. Each parallel instance of the training algorithm is configured with a different set of one or more configuration parameters [parameter description values]. The network trainer 202 may be further configured to capture a time series of partial accuracy values [performance values] for each parallel instance of the training algorithm. [0017]) ….”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Yang and Nair to train with a dataset of parameter description values corresponding to performance values, as disclosed by Faivishevsky, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to keep track of the accuracy values of the model, thereby providing an indicator of when further training is necessary.  See Faivishevsky, paragraph [0017].

Regarding claim 16, Yang in view of Nair/Faivishevsky discloses all of the elements of claim 14 as shown in the rejection above.
Yang/Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Nair teaches that “stopping early based at least in part on the predicted performance value of the assessed machine-learned model comprises: 
stopping early when the predicted performance value of the assessed machine-learned model is less than a threshold performance value (stopping of training in a “local” mode as soon as a probability of improvement in the loss of the NN is less than a threshold, see Nair [0103]).”
The same motivation to combine for claim 14 equally applies to claim 16.

Regarding claim 17, Yang in view of Nair and Faivishevsky discloses all of the elements of claim 14 as shown in the rejection above.
Yang/Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Nair teaches that “stopping early based at least in part on the predicted performance value of the assessed machine-learned model comprises: 
stopping early when the predicted performance value of the assessed machine-learned model is less than an alternative performance value associated with an alternative machine-learned model (Table 2 shows an example algorithm for early stopping in a “global” mode, e.g. to determine if to stop training if the probability of loss improvement when compared with other NNs with different hyperparameters is less than a threshold, see Nair [0105]).”
The same motivation to combine for claim 14 equally applies to claim 17.

Regarding claim 19, Yang in view of Nair and Faivishevsky discloses all of the elements of claim 14 as shown in the rejection above.
Yang also discloses “the machine-learned performance prediction model comprises a recurrent neural network (An additional recurrent neural network (“RNN”) referred to as a controller is trained to generate hyperparameters, see Yang [0078]); and 
inputting, by the computing system, the data descriptive of the plurality of current parameter values into the machine-learned performance prediction model comprises inputting to the machine-learned performance prediction model a sequence of sets of parameter values (hyperparameters generated by said first network are applied to said second network, see Yang [0078])”
Yang/Nair does not explicitly disclose “of sequential instances of the assessed machine-learned model obtained at a number of training checkpoints.”
However, Faivishevsky teaches “of sequential instances of the assessed machine-learned model obtained at a number of training checkpoints (“The training algorithm 304 performs a sequence of training iterations with the machine learning network 302 and produces an associated sequence of partial accuracy values {A.sub.1, A.sub.2, A.sub.3, …, A.sub.t}. Each partial accuracy value A.sub.t corresponds to accuracy of the machine learning network 302 at a training iteration t, see Faivishevsky [0020]. As shown, the configuration parameters (H) are input to a model 306, which includes a deep neural network (DNN) 308 and a recurrent neural network (RNN) 310. […] As shown, the configuration parameters {H} are input into the DNN 308 to generate a representation. That representation is, in turn, injected into the RNN 310. A training algorithm 312 trains the model 306 against the time series of partial accuracy values {A.sub.1}. After training, the RNN 310 produces a modeled time series {A.sub.t}*, see Faivishevsky [0021]”).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang in view of Nair to include a sequence of sets of parameter values of sequential instances of a machine learning model obtained at a number of training checkpoints as disclosed in Faivishevsky, as Faivishevsky discloses “Machine learning training can be tuned with a number of configuration parameters in order to achieve optimum accuracy of the underlying machine learning algorithm”, see Faivishevsky [0001].

Regarding claim 20, Yang discloses “[a] computing system for predicting a performance value of a machine-learned model based on its parameter values to enable early stopping of model training, the computing system comprising:
one or more non-transitory computer-readable media that collectively store instructions (one or more non-transitory computer-readable storage media having stored thereon executable instructions, see Yang [0471]) that, when executed by one or more processors, cause the one or more processors to perform operations, the operations comprising: 
obtaining a sample model performance dataset (generates, using a first network, a set of hyperparameters, see Yang [0078]) …;
using the sample model performance dataset (hyperparameters generated by said first network, see Yang [0078]) to train a machine-learned performance prediction model (are applied to said second network, and said second network is trained using training data, see Yang [0078]); [and]
receiving, from the machine-learned performance prediction model, a predicted performance value of an assessed machine-learned model based on data descriptive of a plurality of parameter values of a plurality of … parameters [of] the assessed machine-learned model (A training quality measure is generated for said trained network, see Yang [0078]); [….]”
Yang does not explicitly disclose the further limitations of the claim.  However, Faivishevsky discloses “correlating a plurality of parameter value datasets respectively associated with a plurality of sample machine-learned models with a corresponding performance value exhibited by the corresponding sample machine-learned model (The network trainer 202 may be further configured to train the machine learning network with multiple, parallel instances of the training algorithm. Each parallel instance of the training algorithm is configured with a different set of one or more configuration parameters. The network trainer 202 may be further configured to capture a time series of partial accuracy values for each parallel instance of the training algorithm. [0017] The examiner notes that the “parameter value dataset respectively associated with a plurality of models” maps to each parallel instance having different parameter values, and the “performance value” maps to each parallel instance having a partial accuracy value) ….”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang to include a plurality of sample machine-learned model generating partial accuracy values as disclosed in Faivishevsky, as Faivishevsky discloses “The computing device trains the feed-forward neural network and the recurrent neural network against the partial accuracy values. The computing device optimizes the feed-forward neural network and the recurrent neural network to determine optimized configuration parameters. The optimized configuration parameters may minimize training time to achieve a predetermined accuracy level.”, see Faivishevsky [Abstract].
Neither Yang nor Faivishevsky explicitly discloses the further limitations of the claim.  However, Nair discloses “a plurality of weight parameters used by the assessed machine-learned model (in some neural networks, during backpropagation, each neuron computes its own gradient for a link for the neuron, the gradient to be applied to adjust the weight of the link [i.e., the weight parameters are used by the model] – Nair, paragraph 54) ….”
Nair further discloses “stopping early the training of the assessed machine-learned model based at least in part on the predicted performance of the assessed machine-learned model (Embodiments of the invention include systems and methods that may stop training of a NN if it is determined that likelihood of improvement in loss over a best loss seen [predicted performance value] is below a threshold, or the training has proceeded to the mean expected or final value of training (typically after a certain number of initial training cycles or epochs), or if training has “stalled” and does not improve (e.g. over a historic best value) over a certain number of training cycles or epochs. In some embodiments, a training module or unit may call an early stopping module or unit (e.g. as described in the examples in Tables 1-5 herein) and receive instructions to stop training or not stop training, see [0032]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang in view of Faivishevsky to include an early stopping algorithm as disclosed in Nair, as Nair discloses that “training NN often involves a diminishing amount of improvement over time, and it is desirable to determine at what point training can be stopped if an acceptable amount of improvement is achieved, without wasting time on further training due to the law of diminishing returns”, see Nair [0013].

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Nair and Faivishevsky and further in view of Deeplizard (2018, October 3), hereinafter referred as Deeplizard. “CNN Flatten Operation Visualized – Tensor Batch Processing for Deep Learning.”
Regarding claim 7, Yang/Nair/Faivishevsky discloses all of the elements of claim 1 as shown in the rejection above.
Yang/Nair/Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Deeplizard teaches that “the data descriptive of the plurality of parameter values of the plurality of weight parameters used by the assessed machine-learned model comprises: 
a set of flattened parameter values of a single layer of the assessed machine-learned model (A tensor flatten operation is a common operation inside convolutional neural networks […] A flatten operation is a specific type of reshaping operation where by all of the axes are smooshed or squashed together, see Deeplizard [Flattening An Entire Tensor, Page 3 Paragraphs 1-2] The examiner notes that the parameter values of a single layer is mapped to the axes in this reference).”
Deeplizard and the instant application are analogous art because they are in the same field of endeavor of neural networks. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang/Nair/Faivishevsky to include flattening parameter values as disclosed in Deeplizard as Deeplizard discloses “A tensor flatten operation is a common operation inside convolutional neural networks. This is because convolutional layer outputs that are passed to fully connected layers must be flatten out before the fully connected layer will accept the input”, see Deeplizard [Flattening an Entire Tensor, Page 3 Paragraphs 1-2].

Claims 8-11 are rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Nair and Faiviskevsky and further in view of Lin et al. (US 20210232927 A1).
Regarding claim 8, Yang/Nair/Faivishevsky discloses all of the elements of claim 1 as shown in the rejection above.
Yang/Nair/Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Lin teaches that “the data descriptive of the plurality of weight parameter values used by the plurality of parameters of the assessed machine-learned model comprises: 
one or more statistics descriptive of the plurality of weight parameters used by the plurality of parameters of the assessed machine-learned model (At block 606 the process involves performing weight standardization for the weights of the connections associated with a node in the convolutional layer, see Lin [0048]. The examiner notes the standardization is the statistics descriptive applied to the plurality of parameter values (weights of the connections). Lin also states based on the value of the loss function, the model training application 104 can perform adjustments on the parameters of the convolutional neural network model 106, such as the weights of the connections, parameters of the convolutional layers, see [0028]).”
Lin and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Yang, Faivishevsky, and Nair to include statistical calculations among the data descriptive of the parameters, as disclosed by Lin, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would enhance the informativeness of the data by providing further information about the parameter distribution.  See Lin, paragraph 48.

Regarding claim 9, Yang/Nair/Faivishevsky in view of Lin discloses all of the elements of claim 8 as shown in the rejection above.
Regarding claim 9, Yang, as modified by Nair, Faivishevsky, and Lin, teaches that “the one or more statistics comprise one or more of: a mean, a variance, and/or one or more percentile values (Statistics of the weights of these connections are calculated, such as the mean and standard deviation, see Lin [0048]) for all of the plurality of parameter values of the plurality of weight parameters used by the assessed machine-learned model (performing weight standardization for the weights of the connections associated with a node in the convolutional layer, see Lin [0048]).”
The same motivation to combine for claim 8 equally applies to claim 9.

Regarding claim 10, Yang in view of Nair, Faivishevsky, and Lin discloses all of the elements of claim 8 as shown in the rejection above.
Yang in view of Nair, Faivishevsky, and Lin teaches that “the one or more statistics comprise one or more of: a mean, a variance, and/or one or more percentile values of the plurality of parameter values of the plurality of weight parameters, computed independently for one or more of a plurality of layers of the assessed machine-learned model (performs weight adjustments for each of the convolutional layers of the convolutional neural network model to normalize the weights of connections associated with a node of the convolutional layer, see Lin [0017]; Statistics of the weights of these connections are calculated, such as the mean and standard deviation, see Lin [0048]).”
The same motivation to combine for claim 8 equally applies to claim 10.

Regarding claim 11, Yang/Nair/Faivishevsky discloses all of the elements of claim 1 as shown in the rejection above.
Yang/Nair/Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Lin teaches that “the data descriptive of the plurality of parameter values of the plurality of parameters of the assessed machine-learned model comprises: 
one or more norms of the plurality of parameter values of the plurality of weight parameters, computed independently for one or more of a plurality of layers of the assessed machine-learned model (performs weight adjustments for each of the convolutional layers of the convolutional neural network model to normalize the weights of connections associated with a node of the convolutional layer, see Lin [0017]).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Yang, Faivishevsky, and Nair to include normalized parameter values, as disclosed by Lin, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the data follow a standardized distribution, thereby increasing the system’s compatibility with other systems.  See Lin, paragraph 17.

Claims 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Nair and Faivishevsky and further in view of Liu et al. (US 20200134387 A1)
Regarding claim 12, Yang/Nair/Faivishevsky discloses all of the elements of claim 1 as shown in the rejection above.
Yang also discloses “wherein the machine-learned performance prediction model comprises (In at least one embodiment, a second network used to generate improved hyperparameters is a recurrent neural network, see Yang [0083])”
Yang/Nair/Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Liu teaches that “the machine-learned performance prediction model comprises a gradient boosting machine with regression trees (Gradient boosted machine models can also utilize tree-based models, see Liu [0117]; see also paragraph 113 (disclosing that the trees may be regression trees)).”
Liu and the instant application both relate to machine learning and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Yang, Faivishevsky, and Nair to employ gradient boosting with regression trees, as disclosed by Liu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would reduce training time for individual models by allowing the system to ensemble multiple individual weak learners.  See Liu, paragraph 117.

Regarding claim 13, Yang/Nair/Faivishevsky discloses all of the elements of claim 1 as shown in the rejection above.
Yang/Nair/Faivishevsky appear not to disclose explicitly the further limitations of the claim.  However, Liu teaches that “the machine-learned performance prediction model comprises a logit-linear model or a fully-connected neural network (Modeling algorithms include, for example, algorithms that involve models such as neural networks, support vector machines, logistic regression, etc., see Liu [0017]; see also paragraphs 104-05 (discussing connections between nodes)).”
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Yang and Nair to neural networks, as disclosed by Liu, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to use a well-understood system that can be easily adapted through standard training processes.  See Liu, paragraph 17.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Yang in view of Nair and Faivishevsky and further in view of Eksombatchai (US 10671672 B1).
Regarding claim 18, Yang in view of Nair and Faivishevsky discloses all of the elements of claim 14 as shown in the rejection above.
Yang in view of Nair and Faivishevsky does not explicitly disclose the further limitations of the claim.  However, Eksombatchai teaches that “stopping early based at least in part on the predicted performance value of the assessed machine-learned model comprises: 
generating a performance versus training round plot based on the predicted performance value of the assessed machine-learned model (Graph 1300 illustrates a relationship between latency [performance value] and a minimum number of nodes [training] that are required to reach a stopping criterion, see Eksombatchai, col. 27, ll. 17-31); and 
stopping early based at least in part on the performance versus training round plot, wherein the performance versus training round plot is based at least in part on the predicted performance value of the assessed machine-learned model (A random walk process that uses an early stopping process may provide recommendations that are comparable in quality, see Eksombatchai, col. 27, ll. 43-53; A first plot 950 indicates a relative F1 score, and a second plot 960 indicates relative number of edges (%). As shown by the plots 950 and 960, pruning may improve the quality of recommendations because the F1 score increases with increased pruning, see Eksombatchai, col. 23, ll. 28-56. Early stopping of training can provide similar values (recommendations), and as shown in these plots, performing early stopping can be done based in part of the performance comparison; see also col. 27, ll. 17-31 (disclosing that the latency vs. minimum number of nodes plot is based on latency, i.e., the performance values of the model)).”
Eksombatchai and the instant application are analogous art because they are in the same field of endeavor of neural networks.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yang in view of Nair and Faivishevsky to include a plot comparing two values and determining whether to perform early stopping based on the plot as disclosed in Eksombatchai as Eksombatchai discloses “As indicated by plot 1330, early stopping may reduce the latency in determining a recommendation”, see Eksombatchai, col. 27, ll. 17-31.

Response to Arguments
Applicant's arguments filed November 13, 2025 (“Remarks”) have been fully considered but they are, except insofar as a ground of rejection has been withdrawn, not persuasive.  Specifically, Applicant’s remarks with respect to the eligibility rejection, Remarks at 9-11, are moot in light of the withdrawal of that ground of rejection, and Applicant’s arguments with respect to the art rejection, Remarks at 11-13, fail to comply with 37 CFR 1.111(b) because they amount to a general allegation that the claims define a patentable invention without specifically pointing out how the language of the claims patentably distinguishes them from the references.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7:00a-5:00p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RYAN C VAUGHN/             Primary Examiner, Art Unit 2125
Read full office action
Prosecution Timeline

Show 10 earlier events
Jul 20, 2025
Response after Non-Final Action
Aug 14, 2025
Non-Final Rejection mailed — §103
Oct 14, 2025
Interview Requested
Oct 21, 2025
Examiner Interview Summary
Oct 21, 2025
Applicant Interview (Telephonic)
Nov 13, 2025
Response Filed
Jan 09, 2026
Final Rejection mailed — §103
Feb 13, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/684,254
Patent 12639610
QUANTUM-CLASSICAL HYBRID COMPUTER FOR CALCULATING ARITHMETIC FUNCTIONS USING FOURIER ANALYSIS
4y 2m to grant Granted May 26, 2026
17/453,330
Patent 12619598
DATA ALLOCATION WITH USER INTERACTION IN A MACHINE LEARNING SYSTEM
4y 6m to grant Granted May 05, 2026
17/872,118
Patent 12619860
METHOD AND DEVICE FOR CONTROLLING FIRING TIMING IN SPIKING NEURAL NETWORKS
3y 9m to grant Granted May 05, 2026
17/159,842
Patent 12608634
DISTRIBUTED QUANTUM FILE CONSOLIDATION
5y 2m to grant Granted Apr 21, 2026
17/304,163
Patent 12602448
PROGRESSIVE NEURAL ORDINARY DIFFERENTIAL EQUATIONS
4y 10m to grant Granted Apr 14, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
62%
Grant Probability
82%
With Interview (+19.8%)
3y 9m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 241 resolved cases by this examiner. Grant probability derived from career allowance rate.