Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/15/2023 and 01/23/2026 were filed before the mailing date of the first office action. The submissions are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are:
“a determination unit configured to determine . . .” in claim 15.
“a weight prediction unit configured to calculate . . .” in claim 15.
“an optimization unit configured to select . . .” in claim 15.
and “an ensemble prediction unit configured to calculate . . .” in claim 15.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 15-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Claim 15 recites a “a determination unit configured to determine an optimal model combination parameter. . .”, “a weight prediction unit configured to calculate a model weight. . .”, “an optimization unit configured to select at least some model weights. . .”, and “an ensemble prediction unit configured to calculate an ensemble prediction value. . .”. Neither the claims nor the specification provide written support for “a determination unit”, “ weight prediction unit”, “an optimization unit”, or “an ensemble prediction unit” beyond recitation as components of a generic computer processor on page 26 of Applicant’s specification. MPEP 2181(II)(B) states “In cases involving a special purpose computer-implemented means-plus-function limitation, the Federal Circuit has consistently required that the structure be more than simply a general purpose computer or microprocessor and that the specification must disclose an algorithm for performing the claimed function”. Therefore, Applicant’s disclosure does not recite sufficient structure for performing the claimed functions. For purposes of examination, Examiner is interpreting that a processor may be used to implement the claimed functions associated with determining model parameters, calculating and selecting model weights, and calculating an ensemble prediction value.
Dependent claims 16-20 are also rejected because they fail to correct the deficiencies of independent claim 15 on which they depend.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 15-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 15 recites a “a determination unit configured to determine an optimal model combination parameter. . .”, “a weight prediction unit configured to calculate a model weight. . .”, “an optimization unit configured to select at least some model weights. . .”, and “an ensemble prediction unit configured to calculate an ensemble prediction value. . .”. Neither the claims nor the specification clearly describe “a determination unit”, “ weight prediction unit”, “an optimization unit”, or “an ensemble prediction unit” beyond recitation as components of a generic computer processor on page 26 of Applicant’s specification. MPEP 2181(II)(B) states “In cases involving a special purpose computer-implemented means-plus-function limitation, the Federal Circuit has consistently required that the structure be more than simply a general purpose computer or microprocessor and that the specification must disclose an algorithm for performing the claimed function”. Therefore, Applicant’s disclosure does not recite sufficient structure such that the metes and bounds of the hardware required to perform these functions are clear. For purposes of examination, Examiner is interpreting that a processor may be used to implement the claimed functions associated with determining model parameters, calculating and selecting model weights, and calculating an ensemble prediction value.
Dependent claims 16-20 are also rejected because they fail to correct the deficiencies of
independent claim 15 on which they depend.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101. Claims 1-6 are directed to a method, claims 7-14 are directed to a separate method, and claims 15-20 are directed to an apparatus; therefore, claims 1-20 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). However, claims 1-20 fall within the judicial exception of an abstract idea, specifically the abstract ideas of “Mental Processes” (including observation, evaluation, and opinion) and “Mathematical Concepts (including mathematical calculations and relationships)”.
Claim 1:
Claim 1 is directed to a method; therefore, the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Claim 1 recites the following abstract ideas:
calculating a model weight of each of the prediction models using a pre-trained ensemble model that uses the prediction value as an input (mental step directed to observation, evaluation – a person could evaluate, or calculate a model weight of each prediction model in an ensemble in their mind using observed prediction values);
selecting at least some model weights from the model weights using a predetermined optimal model combination parameter (mental step directed to observation, evaluation – a person could select at least some observed or mentally determined model weights in their mind using an observed or mentally predetermined optimal model combination parameter);
and calculating an ensemble prediction value for the input data based on the selected model weight and a prediction value of a prediction model corresponding to the selected model weight (mental step directed to observation, evaluation – a person could evaluate, or calculate an ensemble prediction value in their mind based on an observed or determined selected model weight and prediction value).
Claim 1 recites the following additional elements:
collecting prediction values for input data of each prediction model. This additional element is interpreted as an aspect of the technological environment in which the abstract ideas are performed and as well-understood, routine, conventional activity directed to receiving data over network, which does not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea (see MPEP 2106.05(d)(II) and MPEP 2106.05(h)).
Claim 7:
Claim 7 is directed to a method; therefore, the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Claim 7 recites the following abstract ideas:
determining an optimal model combination parameter that produces a highest accuracy using a prediction value of verification data of each prediction model and a pre-trained ensemble model (mental step directed to observation, evaluation – a person could evaluate, or determine an optimal model combination parameter that would produce a highest accuracy in their mind based on observed prediction values from verification data and data related to a pre-trained ensemble model);
calculating a model weight of each of the prediction models using prediction values for input data of each of the prediction models and the ensemble model (mental step directed to observation, evaluation – a person could evaluate, or calculate a model weight of each prediction model in an ensemble in their mind using observed prediction values);
selecting at least some model weights from the model weights using the predetermined optimal model combination parameter (mental step directed to observation, evaluation – a person could select at least some observed or mentally determined model weights in their mind using an observed or mentally predetermined optimal model combination parameter);
and calculating an ensemble prediction value of the input data based on the selected model weight and a prediction value of the input data corresponding to the selected model weight (mental step directed to observation, evaluation – a person could evaluate, or calculate an ensemble prediction value in their mind based on an observed or determined selected model weight and prediction value).
Claim 7 does not recite any additional elements that would integrate the abstract idea into a practical application or amount to significantly more than the abstract idea.
Claim 15 is a system claim and its limitation is included in claim 7. The only difference is that claim 15 requires an apparatus composed of a determination unit, a weight prediction unit, an optimization unit, and an ensemble prediction unit. Given the interpretation set out in the 112 rejections of claim 15, these units are interpreted as components of a generic computer processor, and are further interpreted as merely applying the claimed abstract ideas as described in the rejection of claim 7 (see MPEP 2106.05(f)). Therefore, claim 15 is rejected for the same reasons as claim 7.
The independent claims are not patent eligible.
Dependent claims 2-6, 8-14, and 16-20 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitations fail to establish that the claims are not directed to an abstract idea, as they recite further embellishment of the judicial exception.
Claim 2 recites wherein, in the selecting of the at least some model weights, the number of prediction models is determined using the optimal model combination parameter and the model weight of each of the prediction models, and a model weight having a high value corresponding to the determined number of prediction models is selected (mental step directed to observation, evaluation – a person could determine a number of prediction models in their mind using an observed or determined optimal model combination parameter and observed or determined model weights and select a model weight having a high value in their mind based on the determined number of prediction models).
Claim 3 recites calculating an optimal model weight through a normalization process for the selected model weight, wherein, in the calculating of the ensemble prediction value for the input data, the ensemble prediction value for the input data is calculated based on the optimal model weight and the prediction value of the prediction model corresponding to the selected model weight (mental step directed to observation, evaluation – a person could calculate an optimal weight using a normalization process in their mind, and calculate an ensemble prediction value in their mind based on an observed or determined optimal model weight and prediction value corresponding to a selected model weight).
Claim 4 recites wherein, in the calculating of the optimal model weight, a normalization threshold is calculated using the optimal model combination parameter and the selected model weight, and the optimal model weight is calculated based on the normalization threshold (mental step directed to observation, evaluation – a person could calculate an normalization threshold in their mind based on an optimal model combination parameter and a selected model weight, and calculate an optimal model weight in their mind using the calculated normalization threshold).
Claim 5 recites wherein, in the calculating of the optimal model weight, the optimal model weight is calculated based on Sparse-max to which the optimal model combination parameter is applied (the broadest reasonable interpretation of calculating an optimal model weight using sparse-max includes a mathematical calculation in light of at least page 19 of Applicant’s specification).
Claim 6 recites wherein, in the calculating of the ensemble prediction value of the input data, the ensemble prediction value of the input data is calculated by weighted summing the optimal model weight with the prediction value of the corresponding prediction model (mental step directed to observation, evaluation – a person could calculate a weighted sum of an optimal model weight with prediction values of corresponding models in their mind to determine an ensemble prediction value).
Claim 8 recites wherein the determining of the optimal model combination parameter includes: calculating a model weight of each of the prediction models for the verification data using the ensemble model; calculating an optimal model weight for the model weight of the verification data with respect to each candidate model combination parameter; calculating an ensemble prediction value using an optimal model weight of the verification data with respect to each of the candidate model combination parameters; and determining, as the optimal model combination parameter, a candidate model combination parameter having a minimum prediction error for an ensemble prediction value of the verification data among the candidate model combination parameters (mental step directed to observation, evaluation – a person could calculate a model weight for a given prediction model in their mind using observed verification data, calculate an optimal model weight with respect to observed or determined candidate model combination parameters in their mind, calculate an ensemble prediction value in their mind using observed or determined optimal model weights, and determine a candidate model combination parameter with a minimum prediction error in their mind as an optimal model combination parameter).
Claim 9 recites wherein the calculating of the optimal model weight for the model weight of the verification data includes: determining the number of prediction models for each of the candidate model combination parameters using each of the candidate model combination parameters and a model weight of the verification data; selecting a model weight of the verification data having a high value corresponding to the determined number of prediction models with respect to each of the candidate model combination parameters; and calculating an optimal model weight of each of the candidate model combination parameters through a normalization process with respect to the model weight of the selected verification data (mental step directed to observation, evaluation – a person could determine a number of prediction models in their mind using an observed or determined optimal model combination parameter for verification data and observed or determined model weights, select a model weight for verification data having a high value in their mind based on the determined number of prediction models, and calculate an optimal model weight through a normalization process in their mind based on observed or determined model weights for verification data).
Claim 10 recites wherein, in the calculating of the optimal model weights, normalization thresholds are calculated using each of the candidate model combination parameters and a model weight of the selected verification data, and optimal model weights of the candidate model combination parameters are calculated based on the normalization thresholds of each of the candidate model combination parameters (mental step directed to observation, evaluation – a person could calculate a normalization threshold in their mind based on observed or determined candidate model combination parameters and model weights of selected verification data, and calculate an optimal model weight in their mind using the calculated normalization threshold of the candidate model combination parameters).
Claim 11 recites wherein, in the selecting of the at least some model weights, the number of prediction models is determined using the optimal model combination parameter and the model weight of each of the prediction models, and a model weight having a high value corresponding to the determined number of prediction models is selected (mental step directed to observation, evaluation – a person could determine a number of prediction models in their mind using an observed or determined optimal model combination parameter and observed or determined model weights and select a model weight having a high value in their mind based on the determined number of prediction models).
Claim 12 recites calculating an optimal model weight through a normalization process for the selected model weight, wherein, in the calculating of the ensemble prediction value of the input data, the ensemble prediction value for the input data is calculated based on the optimal model weight and the prediction value of the input data corresponding to the selected model weight (mental step directed to observation, evaluation – a person could calculate an optimal weight using a normalization process in their mind, and calculate an ensemble prediction value in their mind based on an observed or determined optimal model weight and prediction value corresponding to a selected model weight).
Claim 13 recites wherein, in the calculating of the optimal model weight, a normalization threshold is calculated using the optimal model combination parameter and the selected model weight, and the optimal model weight is calculated based on the normalization threshold (mental step directed to observation, evaluation – a person could calculate an normalization threshold in their mind based on an optimal model combination parameter and a selected model weight, and calculate an optimal model weight in their mind using the calculated normalization threshold).
Claim 14 recites wherein, in the calculating of the ensemble prediction value of the input data, the ensemble prediction value of the input data is calculated by weighted summing the optimal model weight with the prediction value of the input data (mental step directed to observation, evaluation – a person could calculate a weighted sum of an optimal model weight with prediction values of corresponding models in their mind to determine an ensemble prediction value).
Claim 16 is a system claim and its limitation is included in claim 8. Claim 16 is rejected for the same reasons as claim 8.
Claim 17 is a system claim and its limitation is included in claim 11. Claim 17 is rejected for the same reasons as claim 11.
Claim 18 is a system claim and its limitation is included in claim 12. Claim 18 is rejected for the same reasons as claim 12.
Claim 19 is a system claim and its limitation is included in claim 13. Claim 19 is rejected for the same reasons as claim 13.
Claim 20 is a system claim and its limitation is included in claim 14. Claim 20 is rejected for the same reasons as claim 14.
Viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claims amount to significantly more than the abstract idea itself. Therefore, the claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Chueh et al (US 20200151610 A1, herein Chueh) in view of Creed et al (US 20210117815 A1, herein Creed).
Regarding claim 1, Chueh teaches a method of ensemble prediction (para. [0014] recites “FIG. 2 shows a flow chart of an ensemble learning predicting method according to one embodiment of the application, which is applicable in an electronic device having a processing circuit”), comprising:
collecting prediction values for input data of each prediction model, calculating a model weight of each of the prediction models using a [pre-trained] ensemble model that uses the prediction value as an input (para. [0014] recites “As shown in FIG. 2, in step 210 (the training phase), the training data Dtrain is used in establishing base predictors h1 , h2 , ... hN (N being a positive integer)”. Para. [0016] recites “In step 230 (the training phase), in the first iteration round, the respective predictor weighting functions of the predictors in the processing set H is established based on all sample data and each sample weighting of all sample data in the validation data Dvalid” (i.e., collecting prediction values and calculating model weights for each prediction model in the ensemble));
selecting at least some model weights from the model weights using a predetermined optimal model combination parameter (para. [0021] recites “In the step 240 (the training phase), the established predictor weighting functions are evaluated. Based on the evaluation result, a target predictor weighting function is selected among the predictor weighting functions established in the current iteration round, a target predictor is selected among the processing set H and the processing set H is updated” (i.e., selecting model weights using an optimization target, or parameter));
and calculating an ensemble prediction value for the input data based on the selected model weight and a prediction value of a prediction model corresponding to the selected model weight (para. [0024]-[0025] recite “The ensemble predictor is obtained (step 270). Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)” (i.e., calculating an ensemble prediction value based on the selected model weights and prediction values)).
However, while Chueh teaches an iterative training process in at least paragraph [0022], Chueh does not explicitly teach a pre-trained model.
Creed teaches a pre-trained model (para. [0201] recites “The functions W and R may be implemented as neural network structures that are configured to be either a pre-trained, fixed, embedding functions (e.g. via word2vec) or a trainable embedding matrices. The former may be used with already trained classifier/model and the example attention mechanism, whereas the latter may be trained when using the example attention mechanism during training of a ML technique that generates a model/classifier for relationship extraction and the like” (i.e., a pre-trained classifier, or predictor model)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by utilizing the method of pre-training models from Creed to pre-train the base predictors in the ensemble model from Chueh. Chueh teaches that an iterative training in at least paragraph [0022], but does not explicitly include the ability to pre-train models. As Chueh teaches in at least paragraph [0014] that “Each of the base predictors may come from different algorithms, or different hyperparameter or different samples. The application is not limited by this”, one of ordinary skill in the art would recognize that the pre-trained models from Creed may be included as the base predictors from Chueh.
Regarding claim 2, the combination of Chueh and Creed teaches the method of claim 1, wherein, in the selecting of the at least some model weights, the number of prediction models is determined using the optimal model combination parameter and the model weight of each of the prediction models (Chueh para. [0021] recites “In the step 240 (the training phase), the established predictor weighting functions are evaluated. Based on the evaluation result, a target predictor weighting function is selected among the predictor weighting functions established in the current iteration round, a target predictor is selected among the processing set H and the processing set H is updated (for example, the selected target predictor is removed from the processing set H)” (i.e., determining the number of models used in the ensemble based on the combination and model weights)), and a model weight having a high value corresponding to the determined number of prediction models is selected (Chueh para. [0023] recites “Among all predictor weighting functions, the predictor weighting function having highest confidence score is selected” (i.e., selecting a model weight with a high value from the number of selected models in the ensemble)).
Regarding claim 3, the combination of Chueh and Creed teaches the method of claim 1, further comprising calculating an optimal model weight through a normalization process for the selected model weight (Chueh para. [0047]-[0048] recite “The g(i)(x) (i.e., weighting functions for models in the ensemble) are normalized and averaged. If the normalized-average result is smaller than 0, then -1 is as the ensemble prediction result of the test data xtest (y = -1). That is, in the implementation 2, the function output values of the predictor weighting functions are normalized as the weighting of each predictor” (i.e., normalizing weights for models in the ensemble)), wherein, in the calculating of the ensemble prediction value for the input data, the ensemble prediction value for the input data is calculated based on the optimal model weight and the prediction value of the prediction model corresponding to the selected model weight (Chueh para. [0024]-[0025] recite “The ensemble predictor is obtained (step 270). Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)” (i.e., calculating the ensemble prediction value based on the model weights and prediction values from the models in the ensemble)).
Regarding claim 4, the combination of Chueh and Creed teaches the method of claim 3, wherein, in the calculating of the optimal model weight, a normalization threshold is calculated using the optimal model combination parameter and the selected model weight, and the optimal model weight is calculated based on the normalization threshold (Chueh para. [0044] recites “The threshold is set as 0.5. Based on a sequence of g(1), g(2), g(3), the first weighting function whose output value is higher than the threshold is selected”. Chueh para. [0045] recites “The weighting function which is the first one to have an output value higher than the threshold is assigned by a highest weighting and the other weighting functions are assigned by zero weighting”. Chueh para. [0047]-[0048] recite “The g(i)(x) (i.e., weighting functions for models in the ensemble) are normalized and averaged. If the normalized-average result is smaller than 0, then -1 is as the ensemble prediction result of the test data xtest (y = -1). That is, in the implementation 2, the function output values of the predictor weighting functions are normalized as the weighting of each predictor” (i.e., determining weights for models in the ensemble based on a normalization threshold)).
Regarding claim 5, the combination of Chueh and Creed teaches the method of claim 3, wherein, in the calculating of the optimal model weight, the optimal model weight is calculated based on Sparse-max to which the optimal model combination parameter is applied (Chueh para. [0028] recites “the sample which is not correctly predicted by the (selected) predictor h(t) will have a higher sample weighting in the next iteration round; and on the contrary, the sample which is correctly predicted by the (selected) predictor h(t) will have a lower sample weighting in the next iteration round”. Creed para. [0116] recites “an attention function may be based on a MAX attention function, which calculates an attention relevancy weight in relation to the maximum score of the set of scores, and assigns the remaining attention relevancy weights either a 0 or minimal value weight. Further examples may include, by way of example only but not limited to, a sparse-max attention function or any suitable attention function for calculating attention weights based on at least the set of scores associated with the set of data” (i.e., determining the model weights for models in the ensemble can be performed using a sparse-max function)).
Regarding claim 6, the combination of Chueh and Creed teaches the method of claim 3, wherein, in the calculating of the ensemble prediction value of the input data, the ensemble prediction value of the input data is calculated by weighted summing the optimal model weight with the prediction value of the corresponding prediction model (Chueh para. [0005] recites “As shown in FIG. 1, the sample data x is input into each of the base predictors h1(x), h2(x) and h3(x). Weighting of the base predictors h1(x), h2(x) and h3(x) are w1, w2 and w3, respectively. If the weighting is dynamic (i.e. the weighting is expressed as: wt= gt(x), and wt varies according to the sample data x), then the prediction result y is expressed as: y = g1(x)h1(x) + g2(x)h2(x) + g3(x)h3(x)”. Chueh para. [0025] recites “Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)” (i.e., calculating an ensemble prediction value using weighted summing of the model weights and prediction values of the models in the ensemble)).
Regarding claim 7, Chueh teaches a method of ensemble prediction (para. [0014] recites “FIG. 2 shows a flow chart of an ensemble learning predicting method according to one embodiment of the application, which is applicable in an electronic device having a processing circuit”), comprising: determining an optimal model combination parameter that produces a highest accuracy using a prediction value of verification data of each prediction model and a [pre-trained] ensemble model (Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated”. Chueh para. [0025] recites “Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)”. Chueh para. [0030] recites “when the predicting results of the selected predictor are correct and the selected predictor is assigned by a high predictor weighting function, it is defined as the consistence is high” (i.e., determining the optimal model parameters for the ensemble model using the validation data from the models in the ensemble));
calculating a model weight of each of the prediction models using prediction values for input data of each of the prediction models and the ensemble model (para. [0014] recites “As shown in FIG. 2, in step 210 (the training phase), the training data Dtrain is used in establishing base predictors h1 , h2 , ... hN (N being a positive integer)”. Para. [0016] recites “In step 230 (the training phase), in the first iteration round, the respective predictor weighting functions of the predictors in the processing set H is established based on all sample data and each sample weighting of all sample data in the validation data Dvalid” (i.e., calculating model weights using prediction values from each prediction model in the ensemble));
selecting at least some model weights from the model weights using the predetermined optimal model combination parameter (para. [0021] recites “In the step 240 (the training phase), the established predictor weighting functions are evaluated. Based on the evaluation result, a target predictor weighting function is selected among the predictor weighting functions established in the current iteration round, a target predictor is selected among the processing set H and the processing set H is updated” (i.e., selecting model weights using an optimization target, or parameter));
and calculating an ensemble prediction value of the input data based on the selected model weight and a prediction value of the input data corresponding to the selected model weight (para. [0024]-[0025] recite “The ensemble predictor is obtained (step 270). Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)” (i.e., calculating an ensemble prediction value based on the selected model weights and prediction values)).
However, while Chueh teaches an iterative training process in at least paragraph [0022], Chueh does not explicitly teach a pre-trained model.
Creed teaches a pre-trained model (para. [0201] recites “The functions W and R may be implemented as neural network structures that are configured to be either a pre-trained, fixed, embedding functions ( e.g. via word2vec) or a trainable embedding matrices. The former may be used with already trained classifier/model and the example attention mechanism, whereas the latter may be trained when using the example attention mechanism during training of a ML technique that generates a model/classifier for relationship extraction and the like” (i.e., a pre-trained classifier, or predictor model)).
See the rejection of claim 1 for motivation to combine.
Regarding claim 8, the combination of Chueh and Creed teaches the method of claim 7, wherein the determining of the optimal model combination parameter includes: calculating a model weight of each of the prediction models for the verification data using the ensemble model (Chueh para. [0014] recites “As shown in FIG. 2, in step 210 (the training phase), the training data Dtrain is used in establishing base predictors h1 , h2 , ... hN (N being a positive integer)”. Chueh para. [0016] recites “In step 230 (the training phase), in the first iteration round, the respective predictor weighting functions of the predictors in the processing set H is established based on all sample data and each sample weighting of all sample data in the validation data Dvalid”. Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated” (i.e., calculating model weights using validation, or verification prediction values from each prediction model in the ensemble));
calculating an optimal model weight for the model weight of the verification data with respect to each candidate model combination parameter (Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated”. Chueh para. [0028] recites “the sample which is not correctly predicted by the (selected) predictor h(t) will have a higher sample weighting in the next iteration round; and on the contrary, the sample which is correctly predicted by the (selected) predictor h(t) will have a lower sample weighting in the next iteration round” (i.e., calculating optimal weights for the validation, or verification data for each model in the ensemble));
calculating an ensemble prediction value using an optimal model weight of the verification data with respect to each of the candidate model combination parameters (Chueh para. [0024]-[0025] recite “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated. The ensemble predictor is obtained (step 270). Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)” (i.e., calculating an ensemble prediction value based on the selected model weights and prediction values));
and determining, as the optimal model combination parameter, a candidate model combination parameter having a minimum prediction error for an ensemble prediction value of the verification data among the candidate model combination parameters (Chueh para. [0024]-[0025] recite “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated. The ensemble predictor is obtained (step 270). Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)”. Chueh para. [0030] recites “updating (adjusting) the sample weighting is based on "whether the prediction results are correct or not" and "whether the prediction results of the predictor and the output value of the predictor weighting function are consistent or not"” (i.e., determining the ensemble prediction value such that the models in the ensemble have highly correct values, or minimum prediction errors)).
Regarding claim 9, the combination of Chueh and Creed teaches the method of claim 8, wherein the calculating of the optimal model weight for the model weight of the verification data includes: determining the number of prediction models for each of the candidate model combination parameters using each of the candidate model combination parameters and a model weight of the verification data (Chueh para. [0021] recites “In the step 240 (the training phase), the established predictor weighting functions are evaluated. Based on the evaluation result, a target predictor weighting function is selected among the predictor weighting functions established in the current iteration round, a target predictor is selected among the processing set H and the processing set H is updated (for example, the selected target predictor is removed from the processing set H)”. Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated” (i.e., determining the number of models used in the ensemble based on the combination and model weights));
selecting a model weight of the verification data having a high value corresponding to the determined number of prediction models with respect to each of the candidate model combination parameters (Chueh para. [0023] recites “Among all predictor weighting functions, the predictor weighting function having highest confidence score is selected”. Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated” (i.e., selecting a model weight with a high value from the validation, or verification data of each model from selected models in the ensemble));
and calculating an optimal model weight of each of the candidate model combination parameters through a normalization process with respect to the model weight of the selected verification data (Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated”. Chueh para. [0047]-[0048] recite “The g(i)(x) (i.e., weighting functions for models in the ensemble) are normalized and averaged. If the normalized-average result is smaller than 0, then -1 is as the ensemble prediction result of the test data xtest (y = -1). That is, in the implementation 2, the function output values of the predictor weighting functions are normalized as the weighting of each predictor” (i.e., normalizing weights for validation, or verification data of the models in the ensemble)).
Regarding claim 10, the combination of Chueh and Creed teaches the method of claim 9, wherein, in the calculating of the optimal model weights, normalization thresholds are calculated using each of the candidate model combination parameters and a model weight of the selected verification data, and optimal model weights of the candidate model combination parameters are calculated based on the normalization thresholds of each of the candidate model combination parameters (Chueh para. [0024] recites “In step 250 (the training phase), each sample weight of each sample of the validation data Dvalid is updated”. Chueh para. [0044] recites “The threshold is set as 0.5. Based on a sequence of g(1), g(2), g(3), the first weighting function whose output value is higher than the threshold is selected”. Chueh para. [0045] recites “The weighting function which is the first one to have an output value higher than the threshold is assigned by a highest weighting and the other weighting functions are assigned by zero weighting”. Chueh para. [0047]-[0048] recite “The g(i)(x) (i.e., weighting functions for models in the ensemble) are normalized and averaged. If the normalized-average result is smaller than 0, then -1 is as the ensemble prediction result of the test data xtest (y = -1). That is, in the implementation 2, the function output values of the predictor weighting functions are normalized as the weighting of each predictor” (i.e., determining weights for validation, or verification data of the models in the ensemble based on a normalization threshold)).
Regarding claim 11, the combination of Chueh and Creed teaches the method of claim 7, wherein, in the selecting of the at least some model weights, the number of prediction models is determined using the optimal model combination parameter and the model weight of each of the prediction models, and a model weight having a high value corresponding to the determined number of prediction models is selected (Chueh para. [0021] recites “In the step 240 (the training phase), the established predictor weighting functions are evaluated. Based on the evaluation result, a target predictor weighting function is selected among the predictor weighting functions established in the current iteration round, a target predictor is selected among the processing set H and the processing set H is updated (for example, the selected target predictor is removed from the processing set H)” (i.e., determining the number of models used in the ensemble based on the combination and model weights). Chueh para. [0023] recites “Among all predictor weighting functions, the predictor weighting function having highest confidence score is selected” (i.e., selecting a model weight with a high value from the number of selected models in the ensemble))).
Regarding claim 12, the combination of Chueh and Creed teaches the method of claim 7, further comprising calculating an optimal model weight through a normalization process for the selected model weight, wherein, in the calculating of the ensemble prediction value of the input data, the ensemble prediction value for the input data is calculated based on the optimal model weight and the prediction value of the input data corresponding to the selected model weight (Chueh para. [0047]-[0048] recite “The g(i)(x) (i.e., weighting functions for models in the ensemble) are normalized and averaged. If the normalized-average result is smaller than 0, then -1 is as the ensemble prediction result of the test data xtest (y = -1). That is, in the implementation 2, the function output values of the predictor weighting functions are normalized as the weighting of each predictor” (i.e., normalizing weights for models in the ensemble)).
Regarding claim 13, the combination of Chueh and Creed teaches the method of claim 12, wherein, in the calculating of the optimal model weight, a normalization threshold is calculated using the optimal model combination parameter and the selected model weight, and the optimal model weight is calculated based on the normalization threshold (Chueh para. [0045] recites “The weighting function which is the first one to have an output value higher than the threshold is assigned by a highest weighting and the other weighting functions are assigned by zero weighting”. Chueh para. [0047]-[0048] recite “The g(i)(x) (i.e., weighting functions for models in the ensemble) are normalized and averaged. If the normalized-average result is smaller than 0, then -1 is as the ensemble prediction result of the test data xtest (y = -1). That is, in the implementation 2, the function output values of the predictor weighting functions are normalized as the weighting of each predictor” (i.e., determining weights for the models in the ensemble based on a normalization threshold)).
Regarding claim 14, the combination of Chueh and Creed teaches the method of claim 12, wherein, in the calculating of the ensemble prediction value of the input data, the ensemble prediction value of the input data is calculated by weighted summing the optimal model weight with the prediction value of the input data (Chueh para. [0005] recites “As shown in FIG. 1, the sample data x is input into each of the base predictors h1(x), h2(x) and h3(x). Weighting of the base predictors h1(x), h2(x) and h3(x) are w1, w2 and w3, respectively. If the weighting is dynamic (i.e. the weighting is expressed as: wt= gt(x), and wt varies according to the sample data x), then the prediction result y is expressed as: y = g1(x)h1(x) + g2(x)h2(x) + g3(x)h3(x)”. Chueh para. [0025] recites “Based on the established base predictors and the respective predictor weighting functions, the ensemble predictor is obtained. In the test phase, the test data x is input into the ensemble predictor for outputting the prediction result y (step 280)” (i.e., calculating an ensemble prediction value using weighted summing of the model weights and prediction values of the models in the ensemble)).
Claim 15 is a system claim and its limitation is included in claim 7. The only difference is that claim 15 requires an apparatus composed of a determination unit, a weight prediction unit, an optimization unit, and an ensemble prediction unit. Given the interpretation set out in the 112 rejections of claim 15, these units are interpreted as components of a generic computer processor (Chueh para. [0014] recites “FIG. 2 shows a flow chart of an ensemble learning predicting method according to one embodiment of the application, which is applicable in an electronic device having a processing circuit”). Therefore, claim 15 is rejected for the same reasons as claim 7.
Claim 16 is a system claim and its limitation is included in claim 8. Claim 16 is rejected for the same reasons as claim 8.
Claim 17 is a system claim and its limitation is included in claim 11. Claim 17 is rejected for the same reasons as claim 11.
Claim 18 is a system claim and its limitation is included in claim 12. Claim 18 is rejected for the same reasons as claim 12.
Claim 19 is a system claim and its limitation is included in claim 13. Claim 19 is rejected for the same reasons as claim 13.
Claim 20 is a system claim and its limitation is included in claim 14. Claim 20 is rejected for the same reasons as claim 14.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20210103858 A1 (Padmanabhan et al) teaches a method for model auto-selection for a prediction using an ensemble of machine learning models.
US 20230031691 A1 (Carroll et al) teaches a method for using performance metrics to weight some training data over other training data when training a machine learning model.
US 20220121999 A1 (Wang et al) teaches a method for distributing a plurality of prediction models, evaluating a prediction from each model, and building an ensemble model by applying weights to the plurality of prediction models.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571) 272-8350. The examiner can normally be reached on M-F 0900-1700 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached on (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/L.M.F./ Examiner, Art Unit 2147
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147