DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 09/22/2025 has been entered.
Response to Arguments
Applicant's arguments filed 09/22/2025 have been fully considered and they are persuasive.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 101, the applicant argues that the amended claims directed to a technical solution. Examiner respectfully agrees and withdraws the prior rejection of claims under 35 USC § 101.
Regarding applicant’s remarks directed to the rejection of claims under 35 USC § 103, the arguments are directed to newly amended limitations that were not previously examined by the examiner. Therefore, applicants arguments are rendered moot. The examiner refers to the rejection under 35 USC § 103 in the current office action for more details.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4,7-13,16-18 and 21-26 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Pub. No. US20070122347A1 Statnikov et al. (“Statnikov”) in view of Weerts, Hilde JP, Andreas C. Mueller, and Joaquin Vanschoren. "Importance of tuning hyperparameters of machine learning algorithms." arXiv preprint arXiv:2007.07588 (2020) (“Weerts”) in further view of Andonie, Răzvan. "Hyperparameter optimization in learning systems." Journal of Membrane Computing 1.4 (2019): 279-291 (“Andonie”) and evidenced by Wikipedia (2021, May 18th). Greedy Algorithm. https://web.archive.org/web/20210518031653/https://en.wikipedia.org/wiki/Greedy_algorithm. (“Wikipedia”)
In regards to claim 1,
Statnikov teaches A computer-implemented method comprising: identifying, for a given machine learning algorithm, two or more hyperparameters and possible values for each of the two or more hyperparameters,
(Statnikov, [0046], “The optimal parameter configuration [identifying,…, two or more hyperparameters ie ‘optimal’ parameter configuration/set and possible values for each of the two or more hyperparameters ie ‘non-optimal’ parameter configurations/sets] of classifier A [for a given machine learning algorithm] for Iteration 1 is determined using the above training sets for validation 420, 422, 424, 426, first x, y parameter set 310, and validation sets 430, 432, 434, 436.”)
Statnikov teaches and automatically tuning at least the first hyperparameter of the [decision tree-based machine learning algorithm pertaining to the at least one decision tree split quality metric, and the second hyperparameter of the decision tree-based machine learning algorithm pertaining to the number of data features associated with one or more decision tree splits; wherein the decision tree-based machine learning algorithm and its hyperparameters are provided by Weerts], based at least in part on the determined first value and the determined at least a second value;
(Statnikov, “[0052] In the example shown in FIG. 4, third parameter set {0, 1} 330 had the highest average accuracy for Iteration 1 (88.8%) [based at least in part on the determined first value and the determined at least a second value], so the most accurate model for Iteration 1 was obtained using third parameter set {0, 1} 330 [automatically tuning].; wherein obtaining the ‘most accurate’ model for the two or more parameters with the highest accuracy (first and second value) as provided by the algorithm of Andonie is automatically tuning the decision tree-based machine learning algorithm”)
Statnikov teaches wherein the method is performed by at least one processing device comprising a processor coupled to a memory.
(Statnikov, “[0113] Alternative embodiments of the above-described methods may be implemented in a software program that is stored on a machine-readable medium and that may be executed by a machine, for example, a computer processor. In addition, the software may be implemented to be downloaded to and installed on individual computers over a network, for example, the Internet.”)
However, Statnikov does not explicitly teach wherein the given machine learning algorithm comprises a decision tree-based machine learning algorithm, and wherein the two or more hyperparameters comprise: (i) a first hyperparameter of the decision tree-based machine learning algorithm pertaining to at least one decision tree split quality metric, and (ii) a second hyperparameter of the decision tree-based machine learning algorithm pertaining to a number of data features associated with one or more decision tree splits;
decision tree-based machine learning algorithm pertaining to the at least one decision tree split quality metric, and the second hyperparameter of the decision tree-based machine learning algorithm pertaining to the number of data features associated with one or more decision tree splits
determining a first value, among the possible values, for the first of the two or more hyperparameters by iterating through each of the possible values for the first hyperparameter, generating a version of the decision tree-based machine learning algorithm corresponding to each iteration, scoring each version of the decision tree-based machine learning algorithm, and identifying the first value based on the scoring of each version, wherein the first value comprises a particular value for the first hyperparameter, and wherein identifying the particular value for the first hyperparameter comprises comparing scores attributed to the scored versions of the decision tree-based machine learning algorithm and selecting the value corresponding to a particular score relative to the other scores;
determining at least a second value, among the possible values, for at least the second of the two or more hyperparameters by iterating through each of the possible values for the at least the second of the two or more hyperparameters, with each iteration being carried out in conjunction with using the identified first value for the first hyperparameter, generating a version of the decision tree-based machine learning algorithm corresponding to each iteration, scoring each version of the decision tree-based machine learning algorithm, and identifying at least the second value for the at least the second of the two or more hyperparameters based on the scoring of each version, wherein the at least a second value comprises at least one particular value for the at least the second of the two or more hyperparameters, and wherein identifying the at least one particular value for the at least the second of the two or more hyperparameters comprises, for each of the at least the second of the two or more hyperparameters, comparing scores attributed to the scored versions of the decision tree-based machine learning algorithm and selecting the value corresponding to a particular score relative to the other scores;
Weerts teaches wherein the decision tree-based machine learning algorithm comprises a decision tree-based machine learning algorithm,
(Weerts, Section 1, “We apply our approach in a benchmark study of two popular classification algorithms: random forests [a decision tree-based machine learning algorithm] and support vector machines.”)
Weerts teaches and wherein the two or more hyperparameters comprise: (i) a first hyperparameter of the decision tree-based machine learning algorithm pertaining to at least one decision tree split quality metric, and (ii) a second hyperparameter of the decision tree-based machine learning algorithm pertaining to a number of data features associated with one or more decision tree splits; decision tree-based machine learning algorithm pertaining to the at least one decision tree split quality metric, and the second hyperparameter of the decision tree-based machine learning algorithm pertaining to the number of data features associated with one or more decision tree splits
(Weerts, Table 2 teaches hyperparameters of the decision tree-based ml algorithm wherein max_features corresponds to (ii) and criterion corresponds to (i)
PNG
media_image1.png
210
375
media_image1.png
Greyscale
”)
However, Weerts does not explicitly teach determining a first value, among the possible values, for the first of the two or more hyperparameters by iterating through each of the possible values for the first hyperparameter, generating a version of the decision tree-based machine learning algorithm corresponding to each iteration, scoring each version of the decision tree-based machine learning algorithm, and identifying the first value based on the scoring of each version, wherein the first value comprises a particular value for the first hyperparameter, and wherein identifying the particular value for the first hyperparameter comprises comparing scores attributed to the scored versions of the decision tree-based machine learning algorithm and selecting the value corresponding to a particular score relative to the other scores;
determining at least a second value, among the possible values, for at least the second of the two or more hyperparameters by iterating through each of the possible values for the at least the second of the two or more hyperparameters, with each iteration being carried out in conjunction with using the identified first value for the first hyperparameter, generating a version of the decision tree-based machine learning algorithm corresponding to each iteration, scoring each version of the decision tree-based machine learning algorithm, and identifying at least the second value for the at least the second of the two or more hyperparameters based on the scoring of each version, wherein the at least a second value comprises at least one particular value for the at least the second of the two or more hyperparameters, and wherein identifying the at least one particular value for the at least the second of the two or more hyperparameters comprises, for each of the at least the second of the two or more hyperparameters, comparing scores attributed to the scored versions of the decision tree-based machine learning algorithm and selecting the value corresponding to a particular score relative to the other scores;
Andonie teaches determining a first value, among the possible values, for the first of the two or more hyperparameters by iterating through each of the possible values for the first hyperparameter, generating a version of the decision tree-based machine learning algorithm corresponding to each iteration, scoring each version of the decision tree-based machine learning algorithm, and identifying the first value based on the scoring of each version, wherein the first value comprises a particular value for the first hyperparameter, and wherein identifying the particular value for the first hyperparameter comprises comparing scores attributed to the scored versions of the decision tree-based machine learning algorithm and selecting the value corresponding to a particular score relative to the other scores;
determining at least a second value, among the possible values, for at least the second of the two or more hyperparameters by iterating through each of the possible values for the at least the second of the two or more hyperparameters, with each iteration being carried out in conjunction with using the identified first value for the first hyperparameter, generating a version of the decision tree-based machine learning algorithm corresponding to each iteration, scoring each version of the decision tree-based machine learning algorithm, and identifying at least the second value for the at least the second of the two or more hyperparameters based on the scoring of each version, wherein the at least a second value comprises at least one particular value for the at least the second of the two or more hyperparameters, and wherein identifying the at least one particular value for the at least the second of the two or more hyperparameters comprises, for each of the at least the second of the two or more hyperparameters, comparing scores attributed to the scored versions of the decision tree-based machine learning algorithm and selecting the value corresponding to a particular score relative to the other scores;
Examiner’s note: The determining limitations first/second hyperparameter values is interpreted to be a greedy algorithm wherein a first set of hyperparameters is considered iteratively to determine the ‘best’ hyperparameter for the first hyperparameter and the first hyperparameter is then used in part to iteratively determine the ‘best’ hyperparameter for the second hyperparameter from the second set of hyperparameters.
(Andonie, Section 2, “A simple strategy for hyperparameter optimization is a greedy approach [Examiner notes: One of ordinary skills in the art would be able to recognize a greedy approach would mean making the locally optimal choice at each stage]: investigate the local neighborhood of a decision tree-based hyperparameter configuration: vary one hyperparameter at a time [by iterating through each of the possible values for the first/second hyperparameter; ie varying one hyperparameter at a time as in iterating through each element in the respective possible values. In the context of the applied references, it would be iterating through possible values for the criterion hyperparameter, first starting at “gini” and then going to “entropy”] and measure how performance changes [generating a version of the decision tree-based machine learning algorithm corresponding to each iteration wherein performance of the hyperparameter can only be observed from instantiating the model with the hyperparameter and executing it, scoring each version of the decision tree-based machine learning algorithm…comparing scores…and selecting the value corresponding to a particular score relative to the other scores; Andonie discloses measuring performance changes]. The only information obtained with this analysis is how different hyperparameter values perform in the context of a single instantiation of the other hyperparameters.”; Since Andonie teaches varying one hyperparameter at a time in a greedy approach, Examiner interprets the algorithm disclosed to be setting the respective hyperparameter as the locally optimal solution before moving to iterate (vary) subsequent hyperparameters with the previously determined hyperparameter. See as evidenced by Wikipedia)
Statnikov and Weerts are both considered to be analogous to the claimed invention because they are in the same field of hyperparameter tuning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Statnikov to incorporate the teachings of Weerts in order to provide a random forest algorithm to examine the effects to changing hyperparameters on model performance (Weerts, Abstract, “The performance of many machine learning algorithms depends on their hyperparameter settings. The goal of this study is to determine whether it is important to tune a hyperparameter or whether it can be safely set to a default value. We present a methodology to determine the importance of tuning a hyperparameter based on a non-inferiority test and tuning risk: the performance loss that is incurred when a hyperparameter is not tuned, but set to a default value. Because our methods require the notion of a default parameter, we present a simple procedure that can be used to determine reasonable default parameters. We apply our methods in a benchmark study using 59 datasets from OpenML. Our results show that leaving particular hyperparameters at their default value is noninferior to tuning these hyperparameters. In some cases, leaving the hyperparameter at its default value even outperforms tuning it using a search procedure with a limited number of iterations.”)
Andonie is considered to be analogous to the claimed invention because they are in the same field of hyperparameter optimization. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Statnikov and Weerts to incorporate the teachings of Andonie in order to provide a simple greedy algorithm to be utilized in hyperparameter optimization as doing so would yield locally optimal results to approximate a globally optimal solution in a reasonable amount of time. (Wikipedia, “In many problems, a greedy strategy does not usually produce an optimal solution, but nonetheless, a greedy heuristic may yield locally optimal solutions that approximate a globally optimal solution in a reasonable amount of time.”)
In regards to claim 2 and analogous claims 12 and 17,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 1,
Statnikov teaches further comprising: performing one or more cross-validation techniques on the decision tree-based machine learning algorithm subsequent to the automatic tuning of the two or more hyperparameters.
(Statnikov, “[0031] In general, in accordance with the embodiments of the present invention, a model may be defined as the specific choice of algorithms (for example, for normalization/data preparation, classification, predictor selection) and the parameters associated with each algorithm. Embodiments of the present invention may be designed to: 1) estimate performance of a decision tree-based model by cross-validation; 2) choose an optimal model [subsequent to the automatic tuning of the two or more hyperparameters] from among multiple possible models by cross-validation [performing one or more cross-validation techniques on the decision tree-based machine learning algorithm]; 3) simultaneously perform tasks 1 and 2; and 4) apply an obtained model, for example, from tasks 2 or 3, to new data.”)
In regards to claim 3,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 2,
Statnikov teaches wherein performing the one or more cross-validation techniques comprises performing one or more k-fold cross-validation techniques.
(Statnikov, “[0037] Optimal Model Selection. The process of optimal model construction may be based on n-fold cross-validation [performing one or more k-fold cross-validation techniques], a method for providing an estimate of the performance of a specific parameterization of a classification model produced by a learning procedure A on available data D.”)
In regards to claim 4 and analogous claims 13 and 18,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 3,
Statnikov teaches wherein performing one or more k- fold cross-validation techniques comprises using a number of folds equivalent to the number of hyperparameters comprised in the two or more hyperparameters of the decision tree-based machine learning algorithm.
(Statnikov, “[0040] When a classifier that is used for learning is parametric, the optimal values of its parameters may be estimated to produce a final model. Assuming that the classifier can be applied with a vector of parameter values and there are m possible instantiations of this vector: {α1, α2, α3, . . . , αm−1, αm} [number of hyperparameters comprised in the two or more hyperparameters of the decision tree-based machine learning algorithm; wherein m is the number of hyperparameters]. Here αi may contain, but is not limited to, the following values:
Choice of classification algorithms (e.g., K-Nearest Neighbors, Support Vector Machines);
Parameters of the specific classification algorithms (e.g., number of neighbors K for K-Nearest Neighbors, penalty parameter C for Support Vector Machines);
Choice of algorithms applied prior to classification, such as variable selection, normalization, imputation, and others (e.g., univariate variable selection by ANOVA, multivariate variable selection by RFE);
Parameters of algorithms applied prior to classification (e.g., number of variables to be used for classification).
[0045] To estimate the optimal value of ac, cross-validation may be used as follows: The performance P(i) of classifier A trained with parameter αi is estimated for i=1, . . . , m [using a number of folds equivalent to the number of hyperparameters] by cross-validation. The final model is built by training A on all available data D using the parameter αj, where j=argmax P(i) for i=1, . . . , m. FIG. 3 is a block diagram of how a data set may be split into training sets/subsets and a testing set/subset for a 5-fold cross-validation selection of a classifier model using the data set and multiple model parameters, in accordance with one or more embodiments of the present invention.”; wherein m can be equal to 5 and thus the number of folds is equivalent to the number of hyperparameters)
In regards to claim 7 and analogous claims 21 and 24,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 1,
Weerts teaches wherein the at least a portion of possible values for each of the two or more hyperparameters comprise one or more of at least one categorical value and at least one numerical value.
(Weerts, Table 2 teaches categorical and numerical value type parameters
PNG
media_image2.png
206
376
media_image2.png
Greyscale
”)
In regards to claim 8 and analogous claims 22 and 25,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 1,
Andonie teaches wherein the at least a portion of possible values for each of the two or more hyperparameters varies in number of values, wherein iterating through each of the at least a portion of the possible values for the first hyperparameter comprises performing a first number of iterations based on a number of possible values across the two or more hyperparameters, and wherein iterating through each of the at least a portion of the possible values for the at least a second of the two or more hyperparameters comprises performing a second number of iterations based on the number of possible values across the two or more hyperparameters.
(Andonie, Section 2, “A simple strategy for hyperparameter optimization is a greedy approach: investigate the local neighborhood of a given hyperparameter configuration: vary one hyperparameter at a time [wherein the at least a portion of possible values for each of the two or more hyperparameters varies in number of values, wherein iterating through each of the at least a portion of the possible values for the first/second hyperparameter comprises performing a first/second number of iterations based on a number of possible values across the two or more hyperparameters; wherein varying one parameter at a time is iterating through each of the possible values for the first/second hyperparameters as it only considers a hyperparameter set at a single time] and measure how performance changes. The only information obtained with this analysis is how different hyperparameter values perform in the context of a single instantiation of the other hyperparameters.”)
In regards to claim 9,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 1,
Statnikov teaches wherein the given machine learning algorithm comprises one of
a support vector machines (SVM) algorithm, a random forest algorithm, a gradient boosting algorithm, a k-means clustering algorithm, a density-based spatial clustering of applications with noise (DBSCAN) algorithm, an agglomerative clustering algorithm, and a neural network.
(Statnikov, “[0040] When a classifier that is used for learning is parametric, the optimal values of its parameters may be estimated to produce a final model. Assuming that the classifier can be applied with a vector of parameter values and there are m possible instantiations of this vector: {α1, α2, α3, . . . , αm−1, αm}. Here αi may contain, but is not limited to, the following values:
Choice of classification algorithms (e.g., K-Nearest Neighbors, Support Vector Machines) [a support vector machines (SVM) algorithm, a k-means clustering algorithm];
Parameters of the specific classification algorithms (e.g., number of neighbors K for K-Nearest Neighbors, penalty parameter C for Support Vector Machines);
Choice of algorithms applied prior to classification, such as variable selection, normalization, imputation, and others (e.g., univariate variable selection by ANOVA, multivariate variable selection by RFE);
Parameters of algorithms applied prior to classification (e.g., number of variables to be used for classification).”)
In regards to claim 10 and analogous claims 23 and 26,
Statnikov in view of Weerts and Andonie teaches The computer-implemented method of claim 1,
PNG
media_image3.png
370
461
media_image3.png
Greyscale
Statnikov teaches wherein scoring each version of the given machine algorithm comprises using at least one set of training data.
(Statnikov, “[0033] FIG. 1 b is a flow diagram of a method for estimating performance in a selected data classification model using data split into multiple training data subsets [using at least one set of training data] and separate test subsets, in accordance with one or more embodiments of the present invention.”)
Claims 11 and 16 are rejected on the same grounds under 35 U.S.C. 103 as claim 1 as they are substantially similar.
Claims 12 and 17 are rejected on the same grounds under 35 U.S.C. 103 as claim 2 as they are substantially similar.
Claims 13 and 18 are rejected on the same grounds under 35 U.S.C. 103 as claim 4 as they are substantially similar.
Claims 14 and 19 are rejected on the same grounds under 35 U.S.C. 103 as claim 5 as they are substantially similar.
Claims 15 and 20 are rejected on the same grounds under 35 U.S.C. 103 as claim 6 as they are substantially similar.
Claims 21 and 24 are rejected on the same grounds under 35 U.S.C. 103 as claim 7 as they are substantially similar.
Claims 22 and 25 are rejected on the same grounds under 35 U.S.C. 103 as claim 8 as they are substantially similar.
Claims 23 and 26 are rejected on the same grounds under 35 U.S.C. 103 as claim 10 as they are substantially similar.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US Pub No. US20160110657A1: Gibiansky teaches Configurable Machine Learning Method Selection and Parameter Optimization System and Method
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASMINE THAI whose telephone number is (703)756-5904. The examiner can normally be reached M-F 8-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.T./Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129