Last updated: May 29, 2026
Application No. 17/832,415
Automated Model Selection

Final Rejection §103§112
Filed
Jun 03, 2022
Priority
Sep 30, 2021 — provisional 63/251,000
Examiner
MAIDO, MAGGIE T
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Etsy Inc.
OA Round
2 (Final)
This examiner grants 62% of cases after interview

— +27.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 39 resolved cases, 2023–2026
Examiner Intelligence

MAIDO, MAGGIE T View full profile →
Grants 62% of resolved cases
Career Allowance Rate
24 granted / 39 resolved
+6.5% vs TC avg
Strong +28% interview lift
Without
With
+27.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
94.2%
+54.2% vs TC avg
§102
0.5%
-39.5% vs TC avg
§112
2.9%
-37.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 39 resolved cases
Office Action

§103 §112
DETAILED ACTION

Response to Amendment
The amendment filed on 5 November 2025 has been entered.
Claims 1-20 are pending.
Claims 5, 12, 19 are cancelled.
Claims 1, 8, 15 are amended.
Claims 1-4, 6-11, 13-18, 20 will be pending.
Applicant’s amendments to the Claims have overcome each and every rejection under 35 USC 101 previously set forth in the Non-Final Office Action mailed 5 June 2025.

 Response to Arguments
Applicant’s remarks, regarding the rejections of claims under 35 USC 103, have been fully considered.

Applicant submits combining Ghanta and Irvine, as proposed in the Office Action, fails to arrive at the claimed subject matter. Applicant submits amended Claim 1 expressly requires that “a linear regression model" be used to compute "(i) a differential value [between the first model performance value and the second model performance value] and (ii) a corresponding confidence interval," which is then used to select “the first machine learning model” to obtain predicted values for a "set of actual data items encountered in a production environment." Furthermore, amended Claim 1 expressly states that the differential value "represents a difference between the first model performance value of the first machine learning model and the second model performance value of the second machine learning model.".
Applicant’s arguments have been considered, but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-4, 6-11, 13-18, 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 and analogous claims 8, 15 recites the limitation "the first model performance metric and the second model performance metric" in lines 17-18. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term "the first model performance metric and the second model performance metric" has been construed to be “the first model performance value and the second model performance value”. Claims 2-4, 6-7, 9-11, 13-14, 16-18, 20, which are dependent on claims 1, 8, 15, are similarly rejected.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 8, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ghanta et al. (U.S. Pre-Grant Publication No. 20200034665, hereinafter ‘Ghanta'), in view of Raschka (NPL: "Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning"), and further in view of Pietro et al. (U.S. Pre-Grant Publication No. 20190370218, hereinafter 'Pietro'). 

Regarding claim 1 and analogous claims 8, 15, Ghanta teaches A computer-implemented method, comprising: obtaining a plurality of training data items and a plurality of labels corresponding to the plurality of training data items, wherein each label represents a ground-truth value for a target attribute relating to the corresponding training data item ([0081] The primary training module 302, in one embodiment, trains the first machine learning algorithm/model on these features and labels obtaining a plurality of training data items using the training data set. The primary validation module 304, in some embodiments, uses a validation data set that includes the same features, but different data, to predict the labels using the first machine learning algorithm/model. The primary validation module 304 may compare the target attribute relating to the corresponding training data item predictions made by the algorithm to the each label represents a ground-truth value true a plurality of labels corresponding to the plurality of training data items label of the test data to calculate the error rate, score, weight, or other value.; [0077] In one embodiment, the primary training module 302 trains the first machine learning model for the first machine learning algorithm on a training data set. For instance, the primary training module 302 may receive, read, access, and/or the like a training data set and provide the training data set to a training pipeline 204 to train the machine learning model. In such an embodiment, the training data set includes labels that allow the first machine learning model to “learn” from the data to perform predictions on an inference data set that does not include labels.);
identifying a proper subset of training data items from among the plurality of training data items ([0077] In one embodiment, the primary training module 302 trains the first machine learning model for the first machine learning algorithm on a training data set. For instance, the primary training module 302 may receive, read, access, and/or the like a training data set and provide the training data set to a training pipeline 204 to train the machine learning model. In such an embodiment, the training data set includes labels that allow the first machine learning model to “learn” from the data to perform predictions on an inference data set that does not include labels. For example, the training data set may include various data points for dogs such as weight, height, gender, breed, etc. The primary training module 302 identifying a proper subset of training data items from among the plurality of training data items
may train the machine learning model using the dog training data set so that it can be used to predict various characteristics of the dog such as a dog's weight, gender, breed, and/or the like using an inference data set that does not include labels for the features that are being predicted.);
for each training data item in the proper subset of training data items: generating, using a first machine learning model and for the training data item, a predicted value for the target attribute ([0078] In one embodiment, the primary validation module 304 is configured to validate the first machine learning algorithm/model using a validation data set. The validation data set, in one embodiment, comprises a data set that includes labels for various features so that when the first machine learning algorithm/model analyzes the validation data set, the generating, using a first machine learning model and for the training data item, a predicted value for the target attribute
predictions that the first machine learning algorithm/model generates can be compared against the labels in the validation data set to determine the accuracy of the predictions.); and
in response to selecting the first machine learning model, obtaining, using the first machine learning model and for a set of actual data items encountered in a production environment, a corresponding set of predicted values for the target attribute ([0071] In one embodiment, the logical machine learning layer 225 includes a in response to selecting the first machine learning model model selection module 212 that is configured to receive the machine learning models that the training pipelines 204 a-b generate and determine which of the machine learning models is the best fit for the objective that is being analyzed. The best-fitting machine learning model may be the machine learning model that produced results most similar to the actual results for the training data (e.g., the most accurate machine learning model), the machine learning model that executes the fastest, the machine learning model that requires the least amount of configuration, and/or the like.; [0091]
In one embodiment, the second machine learning algorithm/model analyzes the obtaining, using the first machine learning model and for a set of actual data items encountered in a production environment, a corresponding set of predicted values for the target attribute predictive performance of the first machine learning algorithm/model after the first machine learning algorithm/model analyzes the inference data set so that the predictions that the first machine learning algorithm/model generates can be used as input into the training of the second machine learning model, along with the error data. In certain embodiments, if the second machine learning model has already been trained, the first and second machine learning algorithms/models may run substantially simultaneously based on the inference data set to determine the predictive performance of the first machine learning algorithm/model in real-time, or substantially in real-time.; [0092] The analysis module 310, in one embodiment, is configured to determine whether the first machine learning algorithm/model is a suitable algorithm/model for generating predictions for the inference data set based on the predictions that the second machine learning algorithm generates.).
Ghanta fails to teach generating, using a second machine learning model and for the training data item, a predicted value for the target attribute; generating a first model performance value for the first machine learning model and a second model performance value for the second machine learning model, wherein the first model performance value and the second model performance value represent a model performance metric, wherein the model performance metric is at least one of a precision, recall, or false positive rate; computing, using a linear regression model and based on the first model performance metric and the second model performance metric, (i) a differential value and (ii) a corresponding confidence interval, wherein: the differential value represents a difference between the first model performance value of the first machine learning model and the second model performance value of the second machine learning model, and the confidence interval indicates a probability that the differential value accurately reflects the difference between the first model performance value and the second model performance value; selecting, based on the computed confidence interval, the first machine learning model;
Raschka teaches generating, using a second machine learning model and for the training data item, a predicted value for the target attribute ([1.1 Performance Estimation: Generalization Performance vs. Model Selection, pg. 4] Let us summarize the main points why we evaluate the predictive performance of a model: 1. We want to estimate the generalization performance, the predictive performance of our model on future (unseen) data. 2. We want to increase the predictive performance by tweaking the learning algorithm and selecting the best performing model from a given hypothesis space. 3. We want to identify the machine learning algorithm that is best-suited for the problem at hand; thus, we want to compare different algorithms, generating, using a second machine learning model selecting the best-performing one as well as the best performing model from the algorithm’s hypothesis space.; [0-1 loss and prediction accuracy., pg. 5-6] In the following article, we will focus on the prediction accuracy, which is defined as the number of all correct predictions divided by the number of examples in the dataset. We compute the prediction accuracy as the number of correct predictions divided by the number of examples n. Or in more formal terms, we define the prediction accuracy ACC as ACC = 1 − ERR, (1) where the prediction error, ERR, is computed as the expected value of the 0-1 loss over and for the training data item n examples in a dataset S: ERRS = 1 n Xn i=1 L( ˆyi , yi). (2) The 0-1 loss L(·) is defined as L( ˆyi , yi) =    0 if yˆi = yi 1 if yˆi 6= yi , (3) where yi is the ith true class label and yˆi the ith a predicted value for the target attribute predicted class label, respectively. Our objective is to learn a model h that has a good generalization performance. Such a model maximizes the prediction accuracy or, vice versa, minimizes the probability, C(h), of making a wrong prediction: C(h) = Pr (x,y)∼D [h(x) 6= y]. (4) Here, D is the generating distribution the dataset has been drawn from, x is the feature vector of a training example with class label y.); 
computing, using a linear regression model and based on the first model performance metric and the second model performance metric, (i) a differential value and ii) a corresponding confidence interval, wherein: the differential value represents a difference between the first model performance value of the first machine learning model and the second model performance value of the second machine learning model, and the confidence interval indicates a probability that the differential value accurately reflects the difference between the first model performance value and the second model performance value; selecting, based on the computed confidence interval, the first machine learning model ([4.2 Testing the Difference of Proportions, pg. 34-35] There are several different statistical hypothesis testing frameworks that are being used in practice to compare the performance of classification models, including conventional methods such as difference of two proportions (here, the proportions are the estimated generalization accuracies from a test set), for which we can construct 95% confidence intervals based on the concept of the Normal Approximation to the Binomial that was covered in Section 1. Performing a computing, using a linear regression model and based on the first model performance metric and the second model performance metric, (i) a differential value and ii) a corresponding confidence interval z-score test for two population proportions is inarguably the most straight-forward way to compare to models (but certainly not the best!): In a nutshell, if the 95% confidence intervals of the accuracies of two models do not overlap, we can reject the null hypothesis that the performance of both classifiers is equal at a confidence level of α = 0.05 (or 5% probability). Violations of assumptions aside (for instance that the test set samples are not independent), as Thomas Dietterich noted based on empirical results in a simulated study [Dietterich, 1998], this test tends to have a high false positive rate (here: incorrectly detecting difference when there is none), which is among the reasons why it is not recommended in practice. Nonetheless, for the sake of completeness, and since it a commonly used method in practice, the general procedure is outlined below as follows (which also generally applies to the different hypothesis tests presented later): 1. formulate the hypothesis to be tested (for instance, the null hypothesis stating that the proportions are the same; consequently, the alternative hypothesis that the proportions are different, if we use a two-tailed test); 2. decide upon a significance threshold (for instance, if the probability of observing a difference more extreme than the one seen is more than 5%, then we plan to reject the null hypothesis); 3. analyze the data, compute the test statistic (here: z-score), and compare its associated p-value (probability) to the previously determined significance threshold; 4. based on the p-value and significance threshold, selecting, based on the computed confidence interval, the first machine learning model either accept or reject the null hypothesis at the given confidence level and interpret the results. The differential value represents a difference z-score is computed as the observed difference divided by the square root for their combined variances z = ACC p 1 − ACC2 σ 2 1 + σ 2 2 , 34 where between the first model performance value of the first machine learning model and the second model performance value of the second machine learning model ACC1 is the accuracy of one model and ACC2 is the accuracy of a second model estimated from the test set. Recall that we computed the variance of the estimated accuracy as σ 2 = ACC(1 − ACC) n in Section 1 and then computed the confidence interval (Normal Approximation Interval) as the confidence interval indicates a probability that the differential value accurately reflects the difference between the first model performance value and the second model performance value ACC ± z × σ, where z = 1.96 for a 95% confidence interval. Comparing the confidence intervals of two accuracy estimates and checking whether they overlap is then analogous to computing the z value for the difference in proportions and comparing the probability (p-value) to the chosen significance threshold. So, to compute the z-score directly for the difference of two proportions, ACC1 and ACC2, we pool these proportions (assuming that ACC1 and ACC2 are the performances of two models estimated on two indendent test sets of size n1 and n2, respectively), ACC1,2 = ACC1 × n1 + ACC2 × n2 n1 + n2 , and compute the standard deviation as σ1,2 = s ACC1,2(1 − ACC1,2) ×  1 n1 + 1 n2 , such that we can compute the z-score, z = ACC1 − ACC2 σ1,2 . Since, due to using the same test set (and violating the independence assumption) we have n1 = n2 = n, so that we can simplify the z-score computation to z = ACC1 − ACC2 √ 2σ 2 = p ACC1 − ACC2 2 · ACC1,2(1 − ACC1,2))/n . where ACC1,2 is simply (ACC1 + ACC2)/2. In the second step, based on the computed z value (this assumes the test errors are independent, which is usually violated in practice as we use the same test set) we can reject the null hypothesis that the pair of models has equal performance (here, measured in "classification accuracy") at an α = 0.05 level if |z| is higher than 1.96. Alternatively, if we want to put in the extra work, we can compute the area under the standard normal cumulative distribution at the z-score threshold. If we find this p-value is smaller than a significance level we set before conducting the test, then we can reject the null hypothesis at that given significance level.);
Ghanta and Raschka are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Ghanta, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Raschka to Ghanta before the effective filing date of the claimed invention in order to compare different algorithms to each other, in terms of predictive and computational performance to select the best-performing model from a set, ranked against each other (cf. Raschka, [1.1 Performance Estimation: Generalization Performance vs. Model Selection, pg. 4] Let us consider the obvious question, "How do we estimate the performance of a machine learning model?" A typical answer to this question might be as follows: "First, we feed the training data to our learning algorithm to learn a model. Second, we predict the labels of our test set. Third, we count the number of wrong predictions on the test dataset to compute the model’s prediction accuracy." Depending on our goal, however, estimating the performance of a model is not that trivial, unfortunately. Maybe we should address the previous question from a different angle: "Why do we care about performance estimates at all?" Ideally, the estimated performance of a model tells how well it performs on unseen data – making predictions on future data is often the main problem we want to solve in applications of machine learning or the development of new algorithms. Typically, machine learning involves a lot of experimentation, though – for example, the tuning of the internal knobs of a learning algorithm, the so-called hyperparameters. Running a learning algorithm over a training dataset with different hyperparameter settings will result in different models. Since we are typically interested in selecting the best-performing model from this set, we need to find a way to estimate their respective performances in order to rank them against each other. Going one step beyond mere algorithm fine-tuning, we are usually not only experimenting with the one single algorithm that we think would be the "best solution" under the given circumstances. More often than not, we want to compare different algorithms to each other, oftentimes in terms of predictive and computational performance.).
Pietro teaches generating a first model performance value for the first machine learning model and a second model performance value for the second machine learning model, wherein the first model performance value and the second model performance value represent a model performance metric ([0064] Specifically, according to one or more embodiments of the disclosure as described in detail below, a network assurance service uses a first machine-learning based model that is locally deployed to a network to assess a set of input features comprising measurements from the network. The service monitors, locally in the network, generating a first model performance value for the first machine learning model performance of the first machine learning-based model. The service determines that the monitored performance of the first machine learning-based model does not meet one or more performance requirements associated with the network. The service selects a second machine learning-based model for deployment to the network, based on the one or more a second model performance value for the second machine learning model performance requirements associated with the network and on the set of input features of the first machine learning-based model. The service deploys the selected second machine learning-based model to the network as a replacement for the first machine learning-based model.; [0070] A key aspect of the techniques herein is the ability for the system to monitor the performance of machine learning-based model(s) 406 executed on-premise by local service 302 a (e.g., as part of machine learning-based analyzer 312 a). To this end, local service 302 a may include model performance monitor (MPM) 408 configured to assess the performance of model(s) 406. In various embodiments, MPM 408 may wherein the first model performance value and the second model performance value assess the performance of model(s) 406 based on any or all of the following: [0071] feedback provided by one or more users via the UI regarding alerts raised by model(s) 406; [0072] feedback from one or more other systems in the network, such as a network security system, etc., that are fed the outputs of model(s) 406; [0073] represent a model performance metric various performance metrics generated by a model 406 itself (e.g., prediction error, etc.); [0074] other information indicative of the performance of model(s) 406 From the assessment of any or all of the above data, MPM 408 may compute a performance score for the model(s) 406 that reflects how accurately the model reflects the on-premise network data.),
wherein the model performance metric is at least one of a precision, recall, or false positive rate ([0038] The performance of a machine learning model can be evaluated in a number of ways based on the number of true positives, false positive rate false positives, true negatives, and/or false negatives of the model. For example, the false positives of the model may refer to the number of times the model incorrectly predicted poor performance in the network or the presence of an anomalous condition. Conversely, the false negatives of the model may refer to the number of times the model predicted good performance when, in fact, poor performance occurred. True negatives and positives may refer to the number of times the model correctly predicted whether the performance was good or poor, respectively. Related to these wherein the model performance metric is at least one of a precision, recall measurements are the concepts of recall and precision. Generally, recall refers to the ratio of true positives to the sum of true positives and false negatives, which quantifies the sensitivity of the model. Similarly, precision refers to the ratio of true positives the sum of true and false positives.);
Ghanta, Raschka, and Pietro are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Ghanta and Raschka, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Pietro to Ghanta before the effective filing date of the claimed invention in order to select a second machine learning-based model for deployment to the network, based on the one or more performance requirements associated with the network and on the set of input features of the first machine learning-based model (cf. Pietro, [0009] According to one or more embodiments of the disclosure, a network assurance service uses a first machine-learning based model that is locally deployed to a network to assess a set of input features comprising measurements from the network. The service monitors, locally in the network, performance of the first machine learning-based model. The service determines that the monitored performance of the first machine learning-based model does not meet one or more performance requirements associated with the network. The service selects a second machine learning-based model for deployment to the network, based on the one or more performance requirements associated with the network and on the set of input features of the first machine learning-based model. The service deploys the selected second machine learning-based model to the network as a replacement for the first machine learning-based model.).

Claims 2-3, 9-10, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Ghanta, in view of Raschka, Pietro, and further in view of Mahmud et al. (U.S. Pre-Grant Publication No. 20200372404, hereinafter 'Mahmud'). 

Regarding claim 2 and analogous claims 9, 16, Ghanta, as modified by Raschka and Pietro, teaches The computer-implemented method of claims 1, 8, 15, respectively.
Ghanta, as modified by Raschka and Pietro, fails to teach wherein identifying a subset of training data items from among the plurality of training data items, comprises: randomly sampling the plurality of training data items to obtain the subset of training data items, wherein the subset of training data items include 10% of the plurality of training data items.
Mahmud teaches wherein identifying a subset of training data items from among the plurality of training data items, comprises: randomly sampling the plurality of training data items to obtain the subset of training data items, wherein the subset of training data items include 10% of the plurality of training data items ([0044] Thus, given a training set, the engines apply various data augmentation methods which are given as input by data scientist on the training set to obtain augmented training sets. For each method to be applied, they can additionally generates different variations or subsets of the training set (e.g., subset of training data items include 10% of the plurality of training data items 10% of randomly sampling the plurality of training data items to obtain the subset of training data items random sample of training data, 50% of random sample of training data, 100% training data, 5% training data with input label “1”, 5% training data with input label “2”, etc). Data augmentation methods can be then applied to each such variation with the selected parameters for a particular method. The data scientist also gives as input how to generate the different variations of training sets. If no parameter is specified for an augmentation method, an augmentation engine chooses a set of default parameters, which can be any of the foregoing.).
Ghanta, Raschka, Pietro, and Mahmud are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Ghanta, Raschka, and Pietro, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Mahmud to Ghanta before the effective filing date of the claimed invention in order to devise an improved method of augmenting data for training a cognitive system which could apply a set of augmentation approaches with different parameters, control the validation process, and select the best augmented model for a particular cognitive system, operating in an automated manner (cf. Mahmud, [0021] It would, therefore, be desirable to devise an improved method of augmenting data for training a cognitive system which could apply a set of augmentation approaches with different parameters, control the validation process, and select the best augmented model for a particular cognitive system. It would be further advantageous if the method could operate in an automated manner. The present invention in its various embodiments achieves these and other advantages by computing a superior augmented model from a set of candidate augmented models generated through selection of augmentation methods, parameter variations, and training set size variation for augmentations, and computing goodness scores for each of the augmented models through a set of features.).

Regarding claim 3 and analogous claims 10, 17, Ghanta, as modified by Raschka and Pietro, teaches The computer-implemented method of claims 1, 8, 15, respectively.
Ghanta, as modified by Raschka and Pietro, fails to teach wherein the ground-truth value for each label in the plurality of labels is specified by a human.
Mahmud teaches wherein the ground-truth value for each label in the plurality of labels is specified by a human ([0053] So, such weights/goodness can be set by the data scientist. In another embodiment, the weights/goodness can be determined via machine learning where the data scientist trains a weight/goodness model for augmentation (this is a separate cognitive system). In order to train the machine learning model, the is specified by a human data scientist can collect wherein the ground-truth value for each label in the plurality of labels ground truth examples of good (e.g., labeled as “1”) and bad (e.g., labeled as ‘0’) augmented models for a particular application, such as a business problem, over time. Once trained, such goodness model returns a goodness score (i.e., weights) for a particular augmented model.).
Ghanta, Raschka, Pietro, and Mahmud are combinable for the same rationale as set forth above with respect to claim 2.

Claims 4, 11, 18 are rejected under 35 U.S.C. 103 as being unpatentable over Ghanta, in view of Raschka, Pietro, and further in view of Yarmus et al. (U.S. Pre-Grant Publication No. 20140236965, hereinafter 'Yarmus'). 

Regarding claim 4 and analogous claims 11, 18, Ghanta, as modified by Raschka and Pietro, teaches The computer-implemented method of claims 1, 8, 15, respectively.
Ghanta, as modified by Raschka and Pietro, fails to teach further comprising: generating, for each training data item, a quality score representing a quality of the training data item and the corresponding label; and applying the quality scores as weights for the linear regression model.
Yarmus teaches further comprising: generating, for each training data item, a quality score representing a quality of the training data item and the corresponding label ([0059] FIG. 4 illustrates one embodiment of method 400 for performing GLM selection on a dataset that stores values for one or more input attributes and a target attribute. At 410 a branch of candidate features is constructed. Respective candidate features in the branch are ordered according to respective inclusion score. The generating, for each training data item, a quality score respective inclusion scores representing a quality of the training data item and the corresponding label estimate a likelihood that respective candidate features will be selected for inclusion in the GLM.); and
applying the quality scores as weights for the linear regression model ([0055] Subsequent sets of k sparse input attributes are chosen, by sampling the ordered list. The sampling of respective sparse input attributes is weighted according to a revised sampling applying the quality scores as weights weight that corresponds to respective inclusion scores reduced by a predetermined factor.; [0012] Systems and methods are described herein that extend the streamwise feature selection method to provide efficient and scalable for the linear regression model model selection for GLM. Branches of candidate features are scored with respect to their probable value to the GLM (e.g., a candidate features correlation with the target attribute in the case of a numerical target attribute).).
Ghanta, Raschka, Pietro, and Yarmus are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Ghanta, Raschka, and Pietro, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Yarmus to Ghanta before the effective filing date of the claimed invention in order to extend the streamwise feature selection method to provide efficient and scalable model selection for generalized linear model (cf. Yarmus, [0012] Systems and methods are described herein that extend the streamwise feature selection method to provide efficient and scalable model selection for GLM. Branches of candidate features are scored with respect to their probable value to the GLM (e.g., a candidate features correlation with the target attribute in the case of a numerical target attribute). Candidate features within a branch are considered by the streamwise feature selection method in order of score. In addition, statistical hints are derived from the dataset to determine an appropriate adaptive penalty function for the streamwise feature selection method. The statistical hints are also used to determine when to re-order candidates within a branch, when to terminate feature selection within a branch and construct a new branch, and when to terminate the feature selection process.).

Claims 6, 13, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ghanta, in view of Raschka, Pietro, and further in view of Cai et al. (U.S. Pre-Grant Publication No. 20130073514, hereinafter 'Cai'). 

Regarding claim 6 and analogous claims 13, 20, Ghanta, as modified by Raschka and Pietro, teaches The computer-implemented method of claims 1, 8, 15, respectively.
Ghanta, as modified by Raschka and Pietro, fails to teach wherein the target attribute is a relevance of search results provided in response to a search query and wherein obtaining, using the first machine learning model and for a set of actual data items encountered in a production environment, a corresponding set of predicted values for the target attribute, comprises: obtaining, using the first machine learning model and for a first set of search results corresponding to a first query, a relevance score indicating whether the first set of search results is relevant to the first query.
 	Cai teaches wherein the target attribute is a relevance of search results provided in response to a search query ([0015] The following description sets forth arrangements and techniques for automatically identifying and extracting structured web data for a given vertical. The target attribute is a relevance of search results web data extracted for a given vertical is associated with a set of target attributes. The arrangements and techniques discussed herein provide a solution for structured web data extraction that: can be flexibly applied to different verticals (i.e. vertical independent), is scalable to deal with a variety of web sites in each separate vertical, and is automatic so that human involvement is significantly reduced and/or limited.; [0020] Again, accurately identifying attributes and extracting values for the targeted attributes within a given vertical may be advantageous to online provided in response to a search query search engines and online knowledge databases.) and
wherein obtaining, using the first machine learning model and for a set of actual data items encountered in a production environment, a corresponding set of predicted values for the target attribute ([0079] Thus, the new web site adaptation module 418 first employs the page-level semantic prediction module 514 to predict whether a text node is relevant to a target attribute.), 
comprises: obtaining, using the first machine learning model and for a first set of search results corresponding to a first query, a relevance score indicating whether the first set of search results is relevant to the first query ([0062] Then, based on the evaluation and analysis, the vertical knowledge learning module 416 builds and trains (1) a classifier that provides an estimated indicating whether the first set of search results is relevant to the first query probability that a text node t of a new web site is relevant, from the semantic perspective, to a particular targeted attribute aj, and (2) a standard inter-attribute layout among attributes in A that provides a criterion for estimating the correctness of an inter-attribute layout from a new web site.; [0072] In other words, while a high context-based relevance score confidently indicates semantic relevance, a low score does not necessarily indicate the absence of semantic relevance. Therefore, context features are relied upon if the corresponding relevance probability is sufficient to confidently indicate that text node t is relevant to an attribute aj. Otherwise, the relevance estimation relies solely on the content-based relevance. In various embodiments, the obtaining, using the first machine learning model and for a first set of search results corresponding to a first query, a relevance score estimation probability in (7) may be referred to as relevance scores (e.g., page-level).).
Ghanta, Raschka, Pietro, and Cai are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Ghanta, Raschka, and Pietro, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Cai to Ghanta before the effective filing date of the claimed invention in order to provide a solution for structured web data extraction that: can be flexibly applied to different verticals (i.e. vertical independent), is scalable to deal with a variety of web sites in each separate vertical, and is automatic so that human involvement is significantly reduced and/or limited (cf. Cai, [0015] The following description sets forth arrangements and techniques for automatically identifying and extracting structured web data for a given vertical. The web data extracted for a given vertical is associated with a set of target attributes. The arrangements and techniques discussed herein provide a solution for structured web data extraction that: can be flexibly applied to different verticals (i.e. vertical independent), is scalable to deal with a variety of web sites in each separate vertical, and is automatic so that human involvement is significantly reduced and/or limited.).

Claims 7, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Ghanta, in view of Raschka, Pietro, and further in view of Brill et al. (U.S. Pre-Grant Publication No. 20200387753, hereinafter 'Brill'). 

Regarding claim 7 and analogous claim 14, Ghanta, as modified by Raschka and Pietro, teaches The computer-implemented method of claims 1, 8, respectively.
Ghanta, as modified by Raschka and Pietro, fails to teach wherein selecting, based on the computed confidence interval, the first machine learning model comprises: determining that the computed confidence interval satisfies a confidence threshold; and in response to determining that the computed confidence interval satisfies a confidence threshold, selecting the first machine learning model.
Brill teaches wherein selecting, based on the computed confidence interval, the first machine learning model comprises: determining that the computed confidence interval satisfies a confidence threshold ([0027] On Step 110, a machine learning prediction model may be obtained. In some exemplary embodiments, the machine learning prediction model may be utilized by a system to provide an estimation for a valuation of a feature vector. Additionally or alternatively, an expected performance and determining that the computed confidence interval satisfies a confidence threshold threshold such as confidence interval of the machine learning prediction model may be obtained. Additionally or alternatively, testing and training data of the machine learning prediction model may be obtained. The testing and training data may be labeled data. Additional metadata may be obtained. The testing and training data may comprise data instances of the machine learning prediction model.); and
in response to determining that the computed confidence interval satisfies a confidence threshold, selecting the first machine learning model ([0096] In some exemplary embodiments, Model Selection Module 365 may be configured to determine, for each data slice, which machine learning model of the one or more Machine Learning Models 320 to utilize to provide the estimated prediction. Model Selection Module 365 may selecting the first machine learning model select the machine learning model with the in response to determining that the computed confidence interval satisfies a confidence threshold performance measurement determined by Performance Calculator 360 for the data slice. The selected machine learning model may be utilized to provide predictions for data instances that are mapped to the data slice.).
Ghanta, Raschka, Pietro, and Brill are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Ghanta, Raschka, and Pietro, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Brill to Ghanta before the effective filing date of the claimed invention in order to determine whether a machine learning prediction model adheres to a target performance requirement that covers also business requirements of the machine learning model that may not be covered by the feature vector of the machine learning model (cf. Brill, [0014] One technical solution is to adapt test planning methodologies of classical software aimed at ensuring coverage of requirements, to handle machine learning models. The methodologies may be utilized to determine whether a machine learning prediction model adheres to a target performance requirement that covers also business requirements of the machine learning model that may not be covered by the feature vector of the machine learning model. Such determination may be performed by determining performance measurements of data slices that are determined based on the business requirements.).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MM/Examiner, Art Unit 2129            
                                                                                                                                                                                 
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jun 03, 2022
Application Filed
Jun 05, 2025
Non-Final Rejection mailed — §103, §112
Nov 05, 2025
Response Filed
Jan 12, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/196,689
Patent 12639595
INFORMATION PROCESSING DEVICE, INFORMATION COMPUTING METHOD, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
5y 2m to grant Granted May 26, 2026
17/330,099
Patent 12602603
MULTI-AGENT INFERENCE
4y 10m to grant Granted Apr 14, 2026
17/392,319
Patent 12596933
CONTEXT-AWARE ENTITY LINKING FOR KNOWLEDGE GRAPHS TO SUPPORT DECISION MAKING
4y 8m to grant Granted Apr 07, 2026
17/062,058
Patent 12579463
GENERATIVE REASONING FOR SYMBOLIC DISCOVERY
5y 5m to grant Granted Mar 17, 2026
17/659,028
Patent 12579452
EVALUATION SCORE DETERMINATION MACHINE LEARNING MODELS WITH DIFFERENTIAL PERIODIC TIERS
3y 11m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
89%
With Interview (+27.6%)
4y 1m (~1m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 39 resolved cases by this examiner. Grant probability derived from career allowance rate.