Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The present application is being examined under the claims filed 08/12/2025. Claims 1-12, 15-30 are pending.
Response to Amendment
This Office Action is in response to Applicant’s communication filed 08/12/2025 in
response to office action mailed 05/12/2025. The Applicant’s remarks and any amendments to the claims or specification have been considered with the results that follow.
Response to Arguments
Regarding Rejections under 35 U.S.C. 101
Applicant’s arguments with respect to the rejections under 35 U.S.C 101 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn.
Regarding Rejections under 35 U.S.C. 103
Applicant argues Kadowaki (US 2021/0124988 A1) and Schapire (“Explaining AdaBoost”) do not, alone or in combination, teach or suggest the limitation: “generating an output prediction equation for the ensemble machine learning model, wherein generating the output prediction equation comprises combining outputs from each machine learning model of the plurality using, based at least in part on, the normalized weights calculated …”, Examiner respectfully disagrees.
While Kadowaki’s regression expression uses binary selection variables to model the impact of including particular component models on evaluation metric, it also teaches (i) representing ensembles and their behavior via regression expressions and (ii) mapping combinations to expected performance — i.e., it models the relationship between model inclusion/configuration and output performance (¶[0121]–[0135]).
Moreover, Schapire explicitly teaches the ensemble output prediction: the final hypothesis H(x) is the sign of a weighted sum of base classifiers, H(x) = sign(Σ αt ht (x)), and the α weights are computed from weighted training errors (Fig. 1). Schapire therefore directly discloses both a weighted combination output (i.e., combining outputs from each model using normalized weights), and how those normalized weights are derived from per model error estimates Choose αt = (1/2) ln((1 – εt) / εt). Also see section 2, page 2 “The final or combined hypothesis H computes the sign of a weighted combination of weak hypotheses … This is equivalent to saying that H is computed as a weighted majority vote of the weak hypotheses ht where each is assigned weight αt.” Section 4 page 7 “AdaBoost can be understood as a procedure for greedily minimizing … the exponential loss … the choices of αt and ht on each round happen to be … chosen so as to cause the greatest decrease in this loss.”
It would have been obvious for one skilled in the art working with Kadowaki’s ensemble-selection/evaluation architecture to adopt Schapire’s known weighted vote ensemble form to produce an ensemble output. Kadowaki’s goal (select/generate integrated models that predict well) and Schapire’s teaching (how to generate an ensemble by weighting component predictors using error-derived weights) are highly complementary and compatible. Combining them yields a predictable result, namely, use Schapire’s weighted output mechanism to produce the ensemble output for the candidate ensembles evaluated in Kadowaki.
Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 4-6, 10, 11, 15, 17-30 are rejected under 35 U.S.C. 103 as being unpatentable over Kadowaki, Tadashi (US 20210124988 A1) (hereinafter referred to as “Kadowaki”) in view of Schapire, Robert E., (2013) “Empirical Inference – Chapter 5: Explaining AdaBoost,” ISBN 973-3-642-41136-6 (hereinafter referred to as “Schapire”), further view of Neven, et al. (4 Nov. 2008) “Training a Binary Classifier with the Quantum Adiabatic,” arXiv:0811.0416v1 (hereinafter referred to as “Neven”).
Regarding claim 1, Kadowaki recites “A method for training an ensemble machine learning model comprising:” (Kadowaki at 0005: These meta-learning methods include ensemble learning methods, which are also called stacked learning methods. These ensemble learning methods use an integrated model comprised of plural trained models to determine, as a value predicted by the integrated model, a majority decision among values respectively predicted by the trained models.)
“a) receiving data characterizing levels of trust in a plurality of machine learning models, wherein the plurality of machine learning models collectively form at least part of the ensemble machine learning model;” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z.) [Receiving data characterizing levels of trust in a plurality of machine learning models, i.e., the absolute difference between the predicted output value (WA) and the corresponding true output value (W), wherein the plurality of machine learning models collectively form at least part of the ensemble machine learning model, i.e., a candidate integrated model and a regression expression generator is an example of an ensemble machine learning model (see 0005)]
“b) calculating a prediction error estimate for each machine learning model of the plurality, wherein the prediction error estimate for each machine learning model is based on a trust score for that machine learning model and relative weights calculated for at least a subset of the data points in a training data set used to train that machine learning model;” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z.) [calculating a prediction error estimate for each machine learning model of the plurality, i.e., the error z, wherein the prediction error estimate for each machine learning model is based on a trust score, i.e., the absolute difference, for that machine learning model and relative weights calculated for at least a subset of the data points in a training data set used to train that machine learning model, i.e., the test dataset]
e) receiving additional data characterizing levels of trust in one or more machine learning models of the plurality and detecting a change in a level of trust, consistent with an adversarial attack, for at least one of the machine learning models
y.” (Kadowaki at 0136: When a sequence of the selection task of the selector 20, the training task of the training unit 22, the evaluation-metric calculation task of the evaluation metric calculator 24, and the regression expression generation task of the regression expression generator 26 is carried out first time, data indicative of the relationship between the candidate integrated-model Sc and the corresponding error Z is at least one data item. An increase in execution of the number of the sequences while selecting candidate models constituting the candidate integrated-model Sc enables the number of data items stored in, for example, the large-capacity storage device 18, each of which represents the relationship between the candidate integrated-model Sc and the corresponding error Z, to increase. See also Kadowaki at 0140, 147.) [Kadowaki re-evaluates the evaluation metric, i.e., output
On the other hand, Kadowaki recites “d) generating an output prediction equation for the ensemble machine learning model, wherein the determination is based, at least in part, on the normalized weights calculated in (c) for each machine learning model of the plurality.” (Kadowaki at 0121: The regression expression generator 26 performs a regression-expression generation task of generating a regression expression based on the candidate integrated-model Sc and the corresponding error, i.e. evaluation metric, Z; the regression expression represents the relationship between the combination Sc and the corresponding error Z. Kadowaki at 0122: Specifically, the regression expression generator 26 uses individual candidate-model variables si and sj to generate the regression expression in accordance with the following expression (1):
PNG
media_image1.png
35
124
media_image1.png
Greyscale
. See also Kadowaki at 0123: where aij represents weight parameters for the product of the candidate-model variables si and sj, and bi represents the weight parameter for the candidate-model variable si.) [Calculating a regression expression, using error z, i.e., the prediction error, with variables si and sj, which are variables representing for each machine learning model in the ensemble, i.e., weights for each machine learning model in the plurality.]
However, while Kadowaki utilizes values, i.e., weights in the regression expression, for each machine learning model, it does not explicitly recite “c) calculating a normalized weight for each machine learning model of the plurality using the prediction error estimate calculated in (b) for each machine learning model of the plurality; and d) wherein generating the output prediction equation comprises combining outputs from each machine learning model of the plurality using the normalized weights calculated in (c) for each machine learning model of the plurality”
On the other hand, Schapire recites “c) calculating a normalized weight for each machine learning model of the plurality using the prediction error estimate calculated in (b) for each machine learning model of the plurality;” (Schapire at pg. 2:
PNG
media_image2.png
72
279
media_image2.png
Greyscale
See also pg. 2: …where each hypothesis is assigned weight αt.)
and d) wherein generating the output prediction equation comprises combining outputs from each machine learning model of the plurality using the normalized weights calculated in (c) for each machine learning model of the plurality (Schapire H(x) = sign(Σ αt ht (x)), and the α weights are computed from weighted training errors (Fig. 1) Also see section 2, page 2 “The final or combined hypothesis H computes the sign of a weighted combination of weak hypotheses … This is equivalent to saying that H is computed as a weighted majority vote of the weak hypotheses ht where each is assigned weight αt.” Section 4 page 7 “AdaBoost can be understood as a procedure for greedily minimizing … the exponential loss … the choices of αt and ht on each round happen to be … chosen so as to cause the greatest decrease in this loss.” )
Kadowaki and Schapire are analogous arts in machine learning and training ensemble models. A person would be motivated to modify Kadowaki with Schapire, before the effective filing date of the present application, to recite c) calculating a normalized weight for each machine learning model of the plurality using the prediction error estimate calculated in (b) for each machine learning model of the plurality; with the motivation being “(pg. 39) Given the weak learning condition, it is possible to prove that the training error of AdaBoost’s final hypothesis decreases to zero very rapidly, in fact, in just O(log m) rounds.” [To achieve this training error decrease, the weak learning condition requires normalizing the weights using the error prediction.1]2
Kadowaki nor Schapire do not explicitly recite, however, Neven recites “reformulating the output prediction equation in the form of a quadratic unconstrained binary optimization (QUBO) problem;” (Neven at pg. 5: To this end we effect a change in the loss function, now using the quadratic loss, such that finding wopt in (4) amounts to solving a quadratic optimization program: [Equation 12] … Eqn. (12) corresponds to a quadratic unconstrained binary optimization (QUBO) problem.) [The loss function of equation 4, i.e., the output prediction equation, is reformulated as equation 12 which is in the form of a QUBO problem.]
(g) solving the QUBO problem using a quantum computing platform to determine an updated normalized weight for each machine learning model of the plurality, wherein the updated normalized weights are configured to reduce a contribution of the at least one machine learning model associated with the detected change in the level of trust (Neven at pg. 6, 4 Implementation details: We implemented the training formulations given by (4) and (12) in Matlab… The resultant problem is solved with a multi-start tabu solver tuned to QUBO problems. See also Neven at pg. 3, “To bring (4) to a form that is amendable to AQC as implemented by the D-Wave hardware[fn3]… fn3 The D-Wave hardware minimizes an Ising function via a physical annealing of thermal and quantum fluctuations.)
Kadowaki, Schapire and Neven are analogous arts in machine learning involving classification/prediction applications. A person would be motivated to modify Kadowaki and Schapire with Neven, before the effective filing date of the present application, to recite motivation being “(Neven at pg. 2) H is the Heaviside step function” and “(Neven at pg. 5) Training a binary classifier… In order for the square loss to be compatible with the binary decision enforced by the sign in eqn. (1) we scale the hi(x) such that hi : x 7→ {− 1 N , 1 N }. Eqn. (12) corresponds to a quadratic unconstrained binary optimization (QUBO) problem. See also (Neven at pg. 10, 6 Discussion) We have seen an impressive performance of global optimization approaches that minimize a regularized measure of training error to find an optimal combination of weights for constructing a binary classifier.”
Regarding claim 2, Kadowaki in view of Schapire recites “The method of claim 1,” and Kadowaki further recites “wherein the data characterizing a level of trust in each machine learning model of the plurality comprises a trust score for each machine learning model of the plurality.” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z.) [The evaluation metric or error Z, i.e., the trust score, is calculated by obtaining an average difference between data items W and the ground-truth, i.e., a level of trust in each machine learning model of the plurality comprises a trust score for each machine learning model of the plurality.]
Regarding claim 4, Kadowaki in view of Schapire recites “The method of claim 2,” and Kadowaki further recites “wherein the trust score for each machine learning model of the plurality is calculated from the received data.” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z.) [The received data, i.e., each ground truth data for each machine learning in the candidate machine learning model, comprises the evaluation metric, i.e., the trust score.]
Regarding claim 5, Kadowaki in view of Schapire recites “The method of claim 4,” and Kadowaki further recites “wherein the received data comprises data relating to a sensitivity of model predictions to input data quality, a sensitivity of model predictions to distributional shifts of training data input, a sensitivity of model predictions to out-of-distribution (OOD) input data, a posterior distribution of model predictions, prediction confidence scores aggregated across one or more training data sets, a ratio of calculated nearest neighbor distances for interclass and intraclass predictions, one or more model performance metrics, or any combination thereof.” (Kadowaki at 0062: For example, the selector 20 can be configured to randomly select, from the individual models, plural models as candidate models for the first sequence, and to select, from the individual models, plural models as candidate models in accordance with a minimization function for each of the second sequence and subsequent sequences. How the selector 20 selects plural models as candidate models in accordance with the minimization function for each of the second sequence and subsequent sequences will be described later. See also Kadowaki at 0091: The training unit 22 performs a training task of training the candidate models f.sub.kr of the candidate integrated model Sc selected by the selector 20. And Kadowaki at 0092: For example, the training unit 22 trains the candidate models f.sub.kr of the candidate integrated model Sc using a training dataset (X, Y) comprised of input data items X and output data items, i.e. ground-truth data items, Y that are respectively paired to the input data items X. See also 0093-95.) [The minimization function is an example of one or more model performance metrics, the evaluation metric is another example of one or more model performance metrics.]
Regarding claim 6, Kadowaki in view of Schapire recite “The method of claim 1,” and Kadowaki further recites “wherein the prediction error estimate is calculated for each machine learning model of the plurality using a loss-based penalty function for that machine learning model that is based, at least in part, on the trust score for that machine learning model.” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z. See also Kadowaki at 0140: The integrated-model generator 30 selects, from the generated candidate integrated-models Sc, one of the candidate integrated-models Sc after the predetermined termination condition is satisfied; the selected one of the candidate-integrated models Sc has the best value of the evaluation metric, i.e. the lowest value of the error Z, in all the generated candidate integrated-models Sc.) [The evaluation metric or error Z, i.e., the trust score, is calculated by obtaining an average difference between data items W and the ground-truth, i.e., a level of trust in each machine learning model of the plurality comprises a trust score for each machine learning model of the plurality. Error Z quantifies prediction error and the goal is to find “the lowest value of error Z” or to minimize the error, i.e., Error Z is functionally equivalent to a loss function.]
Regarding claim 10, Kadowaki in view of Schapire recite “The method of claim 1,” and however neither Kadowaki nor Schapire explicitly recite “wherein the output prediction of the ensemble machine learning model is given by the equation:
PNG
media_image3.png
73
245
media_image3.png
Greyscale
wherein F(x) is a prediction of the ensemble machine learning model for input data value x, N is a number of machine learning models in the ensemble machine learning model, wi are normalized weights for the plurality of machine learning models that collectively form at least part of the ensemble machine learning model, and f(x) are predictions of the individual machine learning models in the ensemble for input data value x.” On the other hand, Neven recites (Neven at pg. 2, equation 1:
PNG
media_image4.png
67
218
media_image4.png
Greyscale
where x ∈ RM are the input patterns to be classified, y ∈ {−1, 1} is the output of the classifier, the hi : x 7→ {−1, 1} are so-called weak classifiers or features detectors, and the wi ∈ [0, 1] are a set of weights to be optimized. H(x) is known as a strong classifier.) [When viewed in the light of Schapire which recites normalizing weights, Neven recites a wi as normalized weights.]
Kadowaki, Schapire and Neven are analogous arts in machine learning involving classification/prediction applications. A person would be motivated to modify Kadowaki and Schapire with Neven, before the effective filing date of the present application, to recite motivation being “(Neven at pg. 2) H is the Heaviside step function” and “(Neven at pg. 5) Training a binary classifier… In order for the square loss to be compatible with the binary decision enforced by the sign in eqn. (1) we scale the hi(x) such that hi : x 7→ {− 1 N , 1 N }. Eqn. (12) corresponds to a quadratic unconstrained binary optimization (QUBO) problem. See also (Neven at pg. 10, 6 Discussion) We have seen an impressive performance of global optimization approaches that minimize a regularized measure of training error to find an optimal combination of weights for constructing a binary classifier.”
Regarding claim 11, Kadowaki, Schapire and Neven recite “The method of claim 10,” and Schapire further recite “wherein the normalized weight, wi, for each machine learning model of the plurality is calculated, at least in part, by taking a natural logarithm of a quotient comprising the prediction error estimate for that machine learning model.”
(Schapire at pg. 2:
PNG
media_image2.png
72
279
media_image2.png
Greyscale
See also pg. 2: …where each hypothesis is assigned weight αt.) [The AdaBoost algorithm obtains an output, i.e., a normalized weight, by taking a natural logarithm of a quotient comprising the prediction error estimate for that machine learning model.3] The motivation rationale used in claim 1 to modify Schapire with Kadowaki is similar applicable to claim 11.
Regarding claim 15, Kadowaki in view of Schapire recite “The method of claim 1,” and Kadowaki further recites “wherein one or more of the machine learning models of the plurality of machine learning models comprises a classifier model.” (Kadowaki at 0118: The information processing apparatus 10 of the exemplary embodiment can be configured such that the test dataset (V, W) is separately prepared, and all the M gray-scale image data items are used as the input data items X of the training dataset (X, Y). The information processing apparatus 10 of the exemplary embodiment can also be configured such that the input data items X of the training dataset (X, Y) are separately prepared, and all the M gray-scale image data items are used as the test dataset (V, W). See also Kadowaki at 0113-7, 0191.) [The plurality of machine learning models are used for image classification, i.e., a classifier model.]
Regarding claim 17, Kadowaki in view of Schapire recite “The method of claim 1,” and Schapire recites “wherein the ensemble machine learning model is trained using an AdaBoost method.” (Schapire at abstract: The AdaBoost algorithm of Freund and Schapire was the first practical boosting algorithm and remains one of the most widely used and studied with applications in numerous fields. See also pg. 2:
PNG
media_image2.png
72
279
media_image2.png
Greyscale
See also pg. 2: …where each hypothesis is assigned weight αt.) [The AdaBoost algorithm obtains an output, i.e., a normalized weight, by taking a natural logarithm of a quotient comprising the prediction error estimate for that machine learning model.4] The motivation rationale used in claim 1 is equally applicable to claim 17.
Regarding claim 18, Kadowaki recite “A method for training an ensemble machine learning model comprising:” (Kadowaki at 0005: These meta-learning methods include ensemble learning methods, which are also called stacked learning methods. These ensemble learning methods use an integrated model comprised of plural trained models to determine, as a value predicted by the integrated model, a majority decision among values respectively predicted by the trained models.)
“a) receiving data characterizing levels of trust in a plurality of machine learning models, wherein the plurality of machine learning models collectively form at least part of the ensemble machine learning model;” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z.) [Receiving data characterizing levels of trust in a plurality of machine learning models, i.e., the absolute difference between the predicted output value (WA) and the corresponding true output value (W), wherein the plurality of machine learning models collectively form at least part of the ensemble machine learning model, i.e., a candidate integrated model and a regression expression generator is an example of an ensemble machine learning model (see 0005)]
“wherein the training comprises the use of a loss-based penalty function for each machine learning model of the plurality to calculate a prediction error estimate for that machine learning model, and wherein the prediction error estimate is based on a trust score for that machine learning model and relative weights calculated for at least a subset of data points in a training data set used to train that machine learning model; and” (Kadowaki at 0120: For example, the evaluation metric calculator 24 inputs the input data items V, which are paired to the respective output data items W, to the first layer of the neural network 50 in a manner similar to the training task to calculate, as a parameter indicative of the evaluation metric, the absolute difference between each of the output data items, i.e. ground-truth data items, W of the test dataset (V, W) and the corresponding one of the outputted data items WA from the candidate integrated model Sc. More specifically, the evaluation metric calculator 24 calculates, as a value of the evaluation metric, the average of the calculated absolute differences between the output data items W of the test dataset (V, W) and the corresponding respective outputted data items WA. The calculated average representing the evaluation metric for the integrated-model candidate will be referred to as an error Z. In particular, the value of the error Z obtained by the evaluation metric calculator 24 will also be referred to as an actually calculated value of the error Z.) [Calculating a prediction error estimate for each machine learning model of the plurality, i.e., the error z, wherein the prediction error estimate for each machine learning model is based on a trust score, i.e., the absolute difference, for that machine learning model and relative weights calculated for at least a subset of the data points in a training data set used to train that machine learning model, i.e., the test dataset]
“d) determining an output prediction equation for the ensemble machine learning model, wherein the normalized weights calculated in (c) are used to formulate the output prediction equation for the ensemble machine learning model.” (Kadowaki at 0121: The regression expression generator 26 performs a regression-expression generation task of generating a regression expression based on the candidate integrated-model Sc and the corresponding error, i.e. evaluation metric, Z; the regression expression represents the relationship between the combination Sc and the corresponding error Z. Kadowaki at 0122: Specifically, the regression expression generator 26 uses individual candidate-model variables si and sj to generate the regression expression in accordance with the following expression (1):
PNG
media_image1.png
35
124
media_image1.png
Greyscale
. See also Kadowaki at 0123: where aij represents weight parameters for the product of the candidate-model variables si and sj, and bi represents the weight parameter for the candidate-model variable si.) [Calculating a regression expression, using error z, i.e., the prediction error, with variables si and sj, which are variables representing for each machine learning model in the ensemble, i.e., weights for each machine learning model in the plurality.]
However, while Kadowaki trains an ensemble model, Kadowaki does not explicitly recite “b) training individual machine learning models of the ensemble machine learning model using an AdaBoost method,” And while Kadowaki utilizes values, i.e., weights in the regression expression, for each machine learning model, it does not explicitly recite “c) calculating a normalized weight for each individual machine learning model of the ensemble; and”
On the other hand, Schapire recites “b) training individual machine learning models of the ensemble machine learning model using an AdaBoost method,” (Schapire at abstract: The AdaBoost algorithm of Freund and Schapire was the first practical boosting algorithm and remains one of the most widely used and studied with applications in numerous fields. See also pg. 2:
PNG
media_image2.png
72
279
media_image2.png
Greyscale
See also pg. 2: …where each hypothesis is assigned weight αt.) [The AdaBoost algorithm obtains an output, i.e., a normalized weight, by taking a natural logarithm of a quotient comprising the prediction error estimate for that machine learning model.5]
“c) calculating a normalized weight for each individual machine learning model of the ensemble; and” (Schapire at pg. 2:
PNG
media_image2.png
72
279
media_image2.png
Greyscale
See also pg. 2: …where each hypothesis is assigned weight αt.) [αt is functionally equivalent to a normalized weight as it is assigning a weight for each hypothesis in the distribution of machine learning models that comprise the ensemble.]
Kadowaki and Schapire are analogous arts in machine learning and training ensemble models. A person would be motivated to modify Kadowaki with Schapire, before the effective filing date of the present application, to recite b) training individual machine learning models of the ensemble machine learning model using an AdaBoost method and c) calculating a normalized weight for each individual machine learning model of the ensemble; and with the motivation being “(pg. 39) Given the weak learning condition, it is possible to prove that the training error of AdaBoost’s final hypothesis decreases to zero very rapidly, in fact, in just O(log m) rounds.” [To achieve this training error decrease, the weak learning condition requires normalizing the weights using the error prediction.6]7
Regarding claim 19, Kadowaki in view of Schapire recite “The method of claim 18” however neither Kadowaki nor Schapire recite “further comprising formulating the output prediction equation for the ensemble machine learning model as a sum of two terms: a) an exponential loss function term that provides a measure of a total number of errors made by the ensemble machine learning model as a function of the normalized weights, wi, for the individual machine learning models of the ensemble in predicting a result, y's, for a given input value, xs, when processing a training data set comprising labeled training data points, (xs, ys); and b) a regularization term that comprises a product of (i) a sum of non-zero normalized weights, wi, for the individual of machine learning models of the ensemble and (ii) a control variable,A; and minimizing the two terms of the output prediction equation to determine the normalized weights, wi, for the plurality of machine learning models.”
On the other hand, Neven recites “further comprising formulating the output prediction equation for the ensemble machine learning model as a sum of two terms:” (Neven at pg. 2, equation 4:
PNG
media_image6.png
56
457
media_image6.png
Greyscale
)
Neven further recites “a) an exponential loss function term that provides a measure of a total number of errors made by the ensemble machine learning model as a function of the normalized weights, wi, for the individual machine learning models of the ensemble in predicting a result, y's, for a given input value, xs, when processing a training data set comprising labeled training data points, (xs, ys); and” (Neven at pg. 2, equation 2:
PNG
media_image7.png
149
586
media_image7.png
Greyscale
)
Neven further recites “b) a regularization term that comprises a product of (i) a sum of non-zero normalized weights, wi, for the individual of machine learning models of the ensemble and (ii) a control variable,A; and minimizing the two terms of the output prediction equation to determine the normalized weights, wi, for the plurality of machine learning models.” (Neven at pg. 2, equation 3:
PNG
media_image8.png
183
609
media_image8.png
Greyscale
)
A person skilled in the art, before the effective filing date of the present application would be motivated to modify Kadowaki and Schapire with Neven to recite further comprising formulating the output prediction equation for the ensemble machine learning model as a sum of two terms: a) an exponential loss function term that provides a measure of a total number of errors made by the ensemble machine learning model as a function of the normalized weights, wi, for the individual machine learning models of the ensemble in predicting a result, y's, for a given input value, xs, when processing a training data set comprising labeled training data points, (xs, ys); and b) a regularization term that comprises a product of (i) a sum of non-zero normalized weights, wi, for the individual of machine learning models of the ensemble and (ii) a control variable,A; and minimizing the two terms of the output prediction equation to determine the normalized weights, wi, for the plurality of machine learning models with the motivation being “(Neven at pg. 3) Each contribution to the overall loss, i.e., the per same loss… enforces an inequality constraint…. It is possible that two different training samples generate identical inequality constraints for wi. In this sense, (7) is a conservative estimate as the actual number of solution spaces is often lower.” And “(Neven at pg. 2) The second term is known as regularization… and it ensures that the classification does not become too complex.” And “(Neven at pg. 2) Adiabatic quantum computing is a new method that draws on quantum mechanical processes that promises to solve hard discrete optimization problems better than possible with classical algorithms.”
Regarding claim 20, Kadowaki, Schapire in view of Neven recite “The method of claim 19,” and Neven further recites “a) converting the normalized weights, wi, for the plurality of machine learning models to binary values using a binary expansion;” (Neven at pg. 3: First, we need to transition from continuous weights wi
∈
[0, 1] to binary variables.)
Neven further recites “b) rewriting the exponential loss function as a quadratic loss function;” (Neven at pg. 5, equation 12:
PNG
media_image9.png
199
586
media_image9.png
Greyscale
) [The first line is the exponential loss function, the second line is the exponential loss function rewritten as a quadratic loss function.]
Neven further recites “c) expanding and combining the quadratic loss function term, the binary values of the normalized weights, wi, and the regularization term to formulate a quadratic unconstrained binary optimization (QUBO) problem; and” (Neven at pg. 5, equation 12:
PNG
media_image10.png
134
585
media_image10.png
Greyscale
) [The rectangle is the quadratic loss function, triangle is the binary values of the normalized weights (as taught by Schapire) wi and the circle is the regularization term, which are combined to form equation 12, i.e., the quadratic unconstrained binary optimization.]
Neven further recites “d) solving the QUBO problem using a quantum computing platform.” (Neven at pg. 6, 4 Implementation details: We implemented the training formulations given by (4) and (12) in Matlab… The resultant problem is solved with a multi-start tabu solver tuned to QUBO problems. See also Neven at pg. 3, “To bring (4) to a form that is amendable to AQC as implemented by the D-Wave hardware[fn3]… fn3 The D-Wave hardware minimizes an Ising function via a physical annealing of thermal and quantum fluctuations.)
The motivation rationale used in claim 19 is similarly applicable for claim 20. Additionally, “(Neven at pg. 6) it has been confirmed that the quadratic unconstrained program with binary weights – an integer programming problem – is NP hard, which validates the motivation for applying quantum algorithms to find wopt in the above formulation. (citations omitted)”
Regarding claim 21, Kadowaki in view of Schapire recite “The method of claim 18” however neither Kadowaki nor Schapire recite “wherein the ensemble machine learning model is a binary classifier.” On the other hand, Neven recites “wherein the ensemble machine learning model is a binary classifier.” (Neven at Abstract: A formulation is employed in which the binary classifier in constructed as a thresholded linear superposition of a set of weak classifiers. See also Neven at pg. 6, Implementation details: The dictionaries of weak classifiers that we employed consist of decision stumps.) [The binary classifier is composed of weak classifiers, which are decision stumps, i.e., machine learning models, therefore the binary classifier is an ensemble machine learning model.]
A person skilled in the art, before the effective filing date of the present application would be motivated to modify Kadowaki and Schapire with Neven to recite wherein the ensemble machine learning model is a binary classifier with the motivation being “(Neven at abstract) …we find that the resulting classifier outperforms a widely used state-of-the-art method, AdaBoost, on a very of benchmark problems.”
Regarding claim 22, Kadowaki, Schapire and Neven recite “The method of claim 20,” and Neven further recites “wherein the binary values derived from binary expansion of the normalized weights, wi, for the plurality of machine learning models comprise qubits.” (Neven at pg. 3: (Neven at pg. 3: First, we need to transition from continuous weights wi
∈
[0, 1] to binary variables… Since each binary variable is associated with a qubit…) The motivation rationale used in claim 20 is similarly applicable claim 22 and qubits are required for quantum computing methods/platforms.8
Regarding Claim 23, Kadowaki, Schapire and Neven recite “the method of claim 22,” however neither recite “wherein the minimum number of qubits, b, required for the binary expansion is given by b > log2(f) + log2(e) - 1, where e is Euler's number, f=S/N, S is the number of training data point pairs, and N is the number of individual machine learning models in the ensemble machine learning model.” On the other hand, Neven recite “wherein the minimum number of qubits, b, required for the binary expansion is given by b > log2(f) + log2(e) - 1, where e is Euler's number, f=S/N, S is the number of training data point pairs, and N is the number of individual machine learning models in the ensemble machine learning model.” (Neven at pg. 4-5, equation 11:
PNG
media_image11.png
32
436
media_image11.png
Greyscale
where e is the Euler number and f = S/N. See also Neven at pg. 2, equation 1:
PNG
media_image12.png
64
232
media_image12.png
Greyscale
… One term measures the error over a set of S training examples {(xs, ys)|s = 1, . . . , S}.) [N stands for the number of weak classifiers (see equation 1), i.e., the number of machine learning models in the ensemble machine learning model. S stands for the number of training examples, i.e., the number of training data pairs.]
The motivation rationale used in claim 22 is similarly applicable to claim 23.
Regarding claim 24, Kadowaki, Schapire and Neven recite “The method of claim 23,” and Neven further recites “wherein b < 32.” (Neven at pg. 9, Fig. 5:
PNG
media_image13.png
194
666
media_image13.png
Greyscale
) The motivation rationale used in claim 23 is similarly applicable to claim 24.
Regarding claim 25, Kadowaki, Schapire and Neven recite “The method of claim 23,” and Neven further recites “wherein b = 1.” (Neven at pg. 5: “Thus for many problems that arise in practice we get away with very few bits and often we will only need only a single bit.” See also Neven at pg. 9, Fig. 5:
PNG
media_image14.png
192
672
media_image14.png
Greyscale
) The motivation rationale used in claim 23 is similarly applicable to claim 25.
Regarding claim 26, Kadowaki, Schapire and Neven recite “The method of claim 25,” and Neven further recites “wherein the quadratic unconstrained binary optimization (QUBO) is expressed as:
PNG
media_image15.png
78
714
media_image15.png
Greyscale
wherein w0pt is a set of optimized weights for a binary classifier which is used to weight predictions of the individual machine learning models.” (Neven at pg. 5, Equation 12:
PNG
media_image16.png
54
68
media_image16.png
Greyscale
….
PNG
media_image17.png
142
717
media_image17.png
Greyscale
)
The motivation rationale used in claim 25 is similarly applicable to claim 26.
Regarding Claim 27, Kadowaki in view of Schapire recite “The method of claim 18” and Schapire further recites “further comprising receiving additional data characterizing levels of trust in one or more machine learning models of the plurality and re-calculating the normalized weight for each individual machine learning model of the ensemble if a change in a level of trust is detected for one or more machine learning models of the plurality.” (Kadowaki at 0136: When a sequence of the selection task of the selector 20, the training task of the training unit 22, the evaluation-metric calculation task of the evaluation metric calculator 24, and the regression expression generation task of the regression expression generator 26 is carried out first time, data indicative of the relationship between the candidate integrated-model Sc and the corresponding error Z is at least one data item. An increase in execution of the number of the sequences while selecting candidate models constituting the candidate integrated-model Sc enables the number of data items stored in, for example, the large-capacity storage device 18, each of which represents the relationship between the candidate integrated-model Sc and the corresponding error Z, to increase. See also Kadowaki at 0147.) [Kadowaki re-evaluates the integrated candidate model with training candidate models over a number of sequences including when the prediction accuracy of the regression expression increases. As the regression expression is re-calculated, i.e., the evaluation metric for the trust score, weights are re-evaluated for the individual models in the regression expression.]
Regarding claim 28, Kadowaki and Schapire in view of Neven recite “The method of claim 20” and Neven further recites “wherein the quantum computing platform comprises an Amazon Bracket, Azure Quantum, D-Wave or TensorFlow Quantum quantum computing platform.” (Neven at pg. 6, 4 Implementation details: We implemented the training formulations given by (4) and (12) in Matlab… The resultant problem is solved with a multi-start tabu solver tuned to QUBO problems. See also Neven at pg. 3, “To bring (4) to a form that is amendable to AQC as implemented by the D-Wave hardware[fn3]… fn3 The D-Wave hardware minimizes an Ising function via a physical annealing of thermal and quantum fluctuations.) The motivation rationale used in claim 20 is similarly applicable for claim 28.
Regarding claim 29, claim 29 is the system embodiment with materially similar limitations as claim 1. Therefore, claim 29 is rejected for the same rationale as claim 1. Additionally, claim 29 recites “A system comprising” (Kadowaki at 0191: The present disclosure can be implemented by various embodiments in addition to the image processing apparatus; the various embodiments include systems each including the image processing apparatus,…)
“one or more processors;” and “memory; and” and “one or more programs stored in memory and comprising instructions that, when executed by the one or more processors, cause the one or more processors to” (Kadowaki at 0044: Referring to FIG. 1, the information processing apparatus 10 of the exemplary embodiment includes a processing unit 12 comprised of, for example, one or more quantum computers, in other words, quantum processors. The information processing apparatus 10 also includes, for example, a read only memory (ROM) 14, a random-access memory (RAM) 16, and a large-capacity storage device 18. These devices 14, 16, and 18 are communicably connected to the processing unit 12.)
Regarding claim 30, claim 30 is the non-transitory, computer-readable embodiment with materially similar limitations as claim 1. Therefore, claim 30 is rejected for the same rationale as claim 1. Additionally, claim 30 recites “A non-transitory, computer-readable medium storing one or more programs, the one or more programs comprising instructions which, when executed by one or more processors of an electronic device or system, cause the electronic device or system to:” (Kadowaki at 0021: A program product according to a fourth aspect of the present disclosure is provided. The program product includes a non-transitory computer-readable storage medium, and a set of computer program instructions stored in the computer-readable storage medium. The instructions cause a computer to serve as[sic]” See also Kadowaki at 0022-0025, Kadowaki at Claim 13.)
Claims 3 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Kadowaki and Schapire in further view of Hall, et al., (US 20230148321 A1) (herein after referred to as “Hall”).
Regarding claim 3, Kadowaki in view of Schapire recites “The method of claim 2,” but neither Kadowaki nor Schapire recite “wherein the trust score is a real number having a value ranging from 0.0 to 1.0.” On the other hand, Hall recites “wherein the trust score is a real number having a value ranging from 0.0 to 1.0.” (Hall at 0094: Confidence based metrics include… Sigmoid score. See also Hall at 0148: Sigmoid Score of a classification model where the prediction is a value between 0 and 1 is defined as…”)
Kadowaki, Schapire and Hall are analogous arts in machine learning involving classification/prediction applications. A person would be motivated to modify Kadowaki and Schapire with Hall, before the effective filing date of the present application, to recite wherein the trust score is a real number having a value ranging from 0.0 to 1.0 with the motivation being “(0149) Sigmoid Score is a “soft” alternative to other Accuracy metrics, in that provides a graded measurement of model performance rather than a sharp cut-off.”
Regarding claim 16, Kadowaki in view of Schapire recites “The method of claim 15,” although Kadowaki does explicitly say the individual candidate models are predictors (see 0054), Kadowaki does not explicitly recite “wherein the classifier model comprises an artificial neural network (ANN), deep learning algorithm (DLA), decision tree algorithm, Naive Bayes algorithm, support vector machine (SVM), or k-nearest neighbor (KNN) algorithm.”
On the other hand, Hall recites “wherein the classifier model comprises an artificial neural network (ANN), deep learning algorithm (DLA), decision tree algorithm, Naive Bayes algorithm, support vector machine (SVM), or k-nearest neighbor (KNN) algorithm.” (Hall at 0006: Deep learning models typically consist of artificial “neural networks” that contain numerous intermediate layers between input and output, where each layer is considered a sub-model, each providing a different interpretation of the data. While the machine learning commonly only accepts structured data as its input, deep learning, on the other hand, does not necessarily need structured data as its input. For example, in order to recognize an image of a dog and a cat, a traditional machine learning model needs user-predefined features from those images. Such a machine learning model will learn from certain numeric features as inputs and can then be used to identify features or objects from other unknown images. The raw image is sent through the deep learning network, layer by layer, and each layer would learn to define specific (numeric) features of the input image.)
A person would be motivated to modify Kadowaki and Schapire with Hall, before the effective filing date of the present application, wherein the classifier model comprises an artificial neural network (ANN), deep learning algorithm (DLA), decision tree algorithm, Naive Bayes algorithm, support vector machine (SVM), or k-nearest neighbor (KNN) algorithm with the motivation being “(0075) Deep Learning and neural networks ‘learn’ features rather than relying on hand designed feature descriptors like machine learning models. This allows them to learn ‘feature representations’ that are tailored to the desired task. These methods are suitable for image analysis, as they are able to pick up both small details and overall morphological shapes in order to arrive at an overall classification[,]” and “(0004) Supervised machine learning (or supervised learning) is a classification technique that learns patterns in labelled (training) data, where the labels or annotations for each datapoint relates to a set of classes, in order to create (predictive) AI models that can be used to classify new unseen data. In the context of this specification, AI will be used to refer to both machine learning and deep learning methods.”
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Kadowaki and Schapire in further view of Weinzaepfel, et al., (US 20220114444 A1) (herein after referred to as “Weinzaepfel”).
Regarding claim 7, Kadowaki in view of Schapire recites “The method of claim 6,” but neither Kadowaki nor Schapire recite “wherein the loss-based penalty function for each machine learning model of the plurality comprises a factor of (2 - t), where t is the trust score for that machine learning model and has a value of 0 < t < 1.”
On the other hand, Weinzaepfel recites “wherein the loss-based penalty function for each machine learning model of the plurality comprises a factor of (2 - t), where t is the trust score for that machine learning model and has a value of 0 < t < 1.” (Weinzaepfel at 0015: Reliability loss may be used in the context of robust patch detection and description. The reliability loss may serve to jointly learn a patch representation along with its reliability (i.e., a confidence score for the quality of the representation), which is also an input dependent output of the network. It may be formulated as:
PNG
media_image18.png
44
318
media_image18.png
Greyscale
where z represents a patch descriptor, y its label, and σ∈[0,1] its reliability. The score for the patch may be computed in the loss in term of differentiable Average-Precision (AP).) [σ is a confidence score, i.e., a trust score, that is between [0,1], i.e., 0 < t < 1, and comprises a factor of 1 – σ, i.e., 1 – t.]
Kadowaki, Schapire and Weinzaepfel are analogous arts in machine learning involving classification/prediction applications. A person skilled in the art would modify Kadowaki and Schapire with Weinzaepfel with the motivation being “(0072) … provide a loss function to estimate reliability of data sample labels predicted by a neural network (module), where the loss function can be applied to any loss and thus to any task, can scale-up to any number of samples, requires no modification of the learning procedure, and has no need for extra data parameters.”
While Weinzaepfel recites a loss-based penalty function with a factor with 1 – t, Weinzaepfel does not recite (2 – t), on the other hand, changing a constant, i.e., 1 to 2, is well known in the art and is obvious to try. MPEP 2143. There are finite number of ways to change a constant and there are predictable solutions in changing 1 to 2. For example, 2-t has a steeper gradient, which lead to faster learning and potentially more aggressive updates to the model parameters during training. See Ruder, Sebastian (15 Jun 2017), “An Overview of Gradient Descent Optimization Algorithms,” arXiv:1609.04747v2.9 See also 20120229826 at 0111.10 2-t is more robust and more heavily penalizes predictions with low trust, while both functions are linear, the proportional difference is different because the starting value of the loss function is higher and therefore 2-t is less sensitive to changes in high confidence score, for example consider a change of 0.9 to 1.0, for 1-t, the loss goes from .1 to 0 (a 100% reduction in loss) but for 2-t the loss score goes from 1.1 to 1 (a rough 9% reduction in loss). Therefore, a person skilled in the art, would obviously try 2-t for the reasons stated above.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Kadowaki, Schapire and Neven in further view of Abbaszadeh, et al. (US 20200067969 A1) (hereinafter referred to as “Abbaszadeh”).
Regarding claim 12, Kadowaki, Schapire and Neven recite “The method of claim 11,” Schapire further recites “wherein the normalized weight, wi, for each machine learning model of the plurality is calculated, at least in part, according to equation:
PNG
media_image19.png
46
254
media_image19.png
Greyscale
, wherein erri is the prediction error estimate calculated for the ith machine learning model of the plurality” (Schapire at pg. 2:
PNG
media_image2.png
72
279
media_image2.png
Greyscale
See also pg. 2: …where each hypothesis is assigned weight αt.) [t represents the ith machine learning model of the plurality.]
While Kadowaki and Schapire discusses normalization, neither Kadowaki, Schapire, nor Neven explicitly recite wherein
PNG
media_image20.png
58
336
media_image20.png
Greyscale
, and wherein N is a number of individual machine learning models in the ensemble machine learning model.” On the other hand, Abbaszadeh recites “wherein
PNG
media_image20.png
58
336
media_image20.png
Greyscale
, and wherein N is a number of individual machine learning models in the ensemble machine learning model.” (Abbaszadeh at 0108:
PNG
media_image21.png
114
311
media_image21.png
Greyscale
) [dm[k] are non-normalized values, specifically probabilities from Gaussian clusters. The sum of the value of the denominator is set to equal to 1. The output is ensemble weights.]
Kadowaki, Schapire, Neven and Abbaszadeh are analogous arts in machine learning involving classification/prediction applications. A person would be motivated to modify Kadowaki, Schapire, Neven with Abbaszadeh, before the effective filing date of the present application, to recite wherein
PNG
media_image20.png
58
336
media_image20.png
Greyscale
, and wherein N is a number of individual machine learning models in the ensemble machine learning model with the motivation being “(0034) Note that ensemble forecasting has been proven to be very efficient in forecasting complex dynamic phenomena, including wind and other weather conditions and Internet communication traffic. In the context of an industrial control system, some embodiments utilize ensembles to cover plant variations in both operating space and ambient conditions. The ensembles may be selected using a soft cluster method, such as Gaussian Mixture Model (“GMM”) clustering, which may provide both centroid (i.e., pre-perceptive operating points) and probability membership functions. A state space model may be developed for each ensemble of each monitoring node, which is used in an adaptive prediction method (e.g., an adaptive multi-step Kalman predictor) to provide ensemble forecast in a receding horizon fashion. Then, the ensemble forecasts are fused via dynamic averaging. Dynamic model averaging has been shown to be superior to other ensemble methods such as Markov Chain Monte Carlo (“MCMC”)—especially for large data sets."
Allowable Subject Matter
Claims 8 and 9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims provided 101 rejections are overcome.
Regarding claim 8, the closest prior art of record, Abraham, Jim (US 20230368915 A1) (hereinafter referred to as “Abraham”) recites a voting module which aggregates output data produced by each machine learning model based on a confidence score, i.e., in an ensemble learning scenario. The confidence score can be “the value of output data such as subtracting a confidence score from the value.” (Abraham at 0118).
However, the examiner has found that the distinct features of the applicant’s claimed invention over the prior art is the explicit claiming of the aforementioned limitations specified in claim 8. When viewed individually or in as a combination with other prior art of record, the limitations specified in claim 8 are distinct.
Examiner’s Note
Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. The entire reference is considered to provide disclosure relating to the claimed invention.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a).
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be
calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this
final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to David X Yi whose telephone number is (571)270-7519. The examiner can normally be reached M-F 9:00-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
1 See pg. 38: “to find a weak hypothesis with low weighted error”.
2 See also Li, et al., (US 20220164711 A1) at 0068:…the framework applies a machine learning meta-algorithm, such as, for example, Adaptive Boosting (AdaBoost). As understood by those of skill in the art, AdaBoost reduces the speed in training and executing a classifier of an AI system by selecting and training only those features that are known to improve the predictive power of the model, thereby reducing the dimensionality while improving the execution time.”
3 See e.g., Liu, et al. (1 March 2015), “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions,” Energy Conversation and Management, Vol. 92 at pg. 74: “The computational steps of the Adaboost algorithm can be explained as follows:
PNG
media_image5.png
128
361
media_image5.png
Greyscale
4 See e.g., Liu, et al. (1 March 2015), “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions,” Energy Conversation and Management, Vol. 92 at pg. 74: “The computational steps of the Adaboost algorithm can be explained as follows:
PNG
media_image5.png
128
361
media_image5.png
Greyscale
5 See e.g., Liu, et al. (1 March 2015), “Comparison of four Adaboost algorithm based artificial neural networks in wind speed predictions,” Energy Conversation and Management, Vol. 92 at pg. 74: “The computational steps of the Adaboost algorithm can be explained as follows:
PNG
media_image5.png
128
361
media_image5.png
Greyscale
6 See pg. 38: “to find a weak hypothesis with low weighted error”.
7 See also Li, et al., (US 20220164711 A1) at 0068:…the framework applies a machine learning meta-algorithm, such as, for example, Adaptive Boosting (AdaBoost). As understood by those of skill in the art, AdaBoost reduces the speed in training and executing a classifier of an AI system by selecting and training only those features that are known to improve the predictive power of the model, thereby reducing the dimensionality while improving the execution time.”
8 See Glover, a refence cited on the applicant’s IDS, at pg. 2: A quantum computer based on quantum annealing with an integrated physical network structure of qubits known as a Chimera graph has incorporated ideas from Wang et al. (2012) in its software and has been implemented on the D-Wave system.
9 See pg. 4. (“Essentially, when using momentum, we push a ball down a hill. The ball accumulates momentum as it rolls downhill, becoming faster and faster on the way (until it reaches its terminal velocity, if there is air resistance, i.e. γ < 1). The same thing happens to our parameter updates: The momentum term increases for dimensions whose gradients point in the same directions and reduces updates for dimensions whose gradients change directions. As a result, we gain faster convergence and reduced oscillation.”)
10 20120229826 at 0111: “…predetermined maximum value of change in steepness, approximately 0.02. This constant may be dependent on the maximum absolute second derivative of the compression function. The constant may be approximately 0 to let the steepness of the compression function hardly change, the constant may be much larger than 0 to let the steepness of the compression function freely change and the constant may be -1 to let this constant be inactive.” I.e., the steepness of a function controls the how much values change in each training/backpropagation cycle.