Office Action Analysis: 18193781 — METHOD AND DEVICE WITH ENSEMBLE MODEL FOR DATA LABELING

Office Action

§101 §102 §103 §112
DETAILED ACTION
This office action is in response to submission of application on 03/31/2023.
Claims 1-20 are presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/31/2023 and 05/21/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 112
Claims 1-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the inference performance traits" in line 3. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, this limitation will be interpreted as referring to the inference performance features.
Claim 3 recites the limitations “the validation inputs” and “the validation result data” in lines 5-6. There is insufficient antecedent basis for these limitations in the claim. For examination purposes, the limitations will be interpreted as referring to 
Claim 10 recites the limitation “the validation inputs” in line 1. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, this limitation will be interpreted as referring only to the labeling target inputs.
Claims 2-10 are additionally rejected due to their dependence on rejected claim 1 for the reasons outlined above.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

	Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

	Claim 1:
Step 1: The claim is directed to a method, which falls within the statutory category of a
process.
Step 2A Prong 1: The claim is directed to an abstract idea. Specifically, the claim recites:
determining inference performance features of respective neural network models comprised in an ensemble model, wherein the inference performance traits correspond to performance of the neural network models with respect to inferring classes of the ensemble model; (Abstract idea – mental process. Determining inference performance features of neural networks can practically be performed in the human mind or with the aid of pen and paper, for example, by mentally comparing the models’ predicted outputs to expected outputs and mentally determining a numerical indicator of their similarity. See MPEP 2106.04(a)(2)(III).)
based on the inference performance features, determining weights for each of the classes for each of the neural network models, wherein the weights are not weight of nodes of the neural network models; (Abstract idea – mental process. Determining weights for each class for each neural network based on inference performance features can practically be performed in the human mind or with the aid of pen and paper, for example, by mentally determining the weight for each class to be equal to the model’s inference performance on that class. See MPEP 2106.04(a)(2)(III).)
determining score data representing confidences for each of the classes for the labeling target inputs by applying weights of the weight data to the classification result data; and (Abstract idea – mental process. Applying weights to predictions to determine a confidence score for each class can practically be performed in the human mind or with the aid of pen and paper, for example, by mentally multiplying each model’s predicted value for each class by the corresponding determined class weight. See MPEP 2106.04(a)(2)(III).)
measuring classification accuracy of the classification operation for the labeling target inputs based on the score data. (Abstract idea – mental process. Measuring classification accuracy for the input based on the score data can practically be performed in the human mind or with the aid of pen and paper, for example, by mentally comparing the determined class confidence scores to the expected output for each input. See MPEP 2106.04(a)(2)(III).)
Step 2A Prong 2: The additional elements recited in the claim do not integrate the abstract idea into a practical application, individually or in combination. Specifically, the claim recites the additional elements:
generating classification result data by performing a classification inference operation on labeling target inputs by the neural network models; (Performing a classification inference operation on input data using generic neural networks is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Specifically, the claim recites the additional elements:
generating classification result data by performing a classification inference operation on labeling target inputs by the neural network models; (Performing a classification inference operation on input data using generic neural networks is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)

Claims 2-10:
Claim 2 recites The labeling method of claim 1, further comprising: generating validation result data by performing a classification operation on validation inputs by the neural network models; generating first partial data of the validation result data by performing a first classification operation on the validation inputs by the neural network models; generating additional validation inputs by transforming the validation inputs; and generating second partial data of the validation result data by performing a second classification operation on the additional validation inputs by the neural network models. Performing classification inference operations on validation data using generic neural networks is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f). Transforming the validation input data to generate additional validation input data can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by writing out input vectors on a sheet of paper and altering their values by hand. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 1, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 3 recites The labeling method of claim 1, wherein the determining of the weight data comprises: determining model confidence data indicating model confidence of the neural network models and consistency data indicating classification consistency for each class of the neural network models, based on a comparison result between labels of the validation inputs and the validation result data; and determining the weight data by integrating the model confidence data and the consistency data. Determining model confidence and consistency data and integrating them to determine weight data can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by mentally comparing the models’ predicted validation outputs to expected validation outputs, mentally determining a numerical indicator of their similarity, both aggregated across classes (confidence) and within each class (consistency), and mentally multiplying the confidence values by the consistency values to obtain the weight values. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 1, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 4 recites The labeling method of claim 1, wherein the measuring of the classification accuracy comprises: determining representative score values based on the labeling target inputs based on the score data; and classifying each of the labeling target inputs into a first group or a second group based on the representative score values. Determining representative score values and using them to classify inputs into a first or second group can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by mentally determining the representative score value to be the confidence score of the class with the highest prediction confidence, mentally comparing this value to a confidence threshold, and mentally classifying the input as either certain or uncertain based on the comparison. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 1, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 5 recites The labeling method of claim 4, wherein the score data comprises individual score data of each of the labeling target inputs, and the determining of the representative score values comprises determining a maximum value of each piece of individual score data of the labeling target inputs of the score data to be the representative score value of each of the labeling target inputs. Determining a maximum value of each piece of individual score data to be the representative score value for each input can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by mentally determining the maximum of the class scores for each input and mentally determining this maximum score to be the representative score value for the input. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 4, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 6 recites The labeling method of claim 1, wherein the weight data corresponds to an m*n weight matrix, where m is the number of the neural network models, and where n is the number of classes. Representing the weight data as an m*n matrix can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by writing out the weight values in a 2-dimensional grid on a sheet of paper, with dimensions of the grid corresponding to the number of models and classes. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 1, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 7 recites The labeling method of claim 6, wherein the generating of the classification result data comprises generating a first classification result matrix based on a first labeling target input of the labeling target inputs, and the determining of the score data comprises determining an m*n first individual score matrix by applying the weight matrix to the first classification result matrix. Determining a classification result matrix and applying the weight matrix to the classification result matrix to obtain a score matrix can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by writing out the results of the classification inference operation in a 2-dimensional grid on a sheet of paper and performing an element-wise multiplication with the weight matrix by hand to determine the score matrix. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 6, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 8 recites The labeling method of claim 7, wherein the measuring of the classification accuracy comprises: determining a first representative score value for the first labeling target input from the first individual score matrix; and classifying the first labeling target input into a certain label group or a review group based on the first representative score value. Determining a representative score value and using it to classify an input into a certain label group or a review group can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by mentally determining the representative score value to be the confidence score of the class with the highest prediction confidence, mentally comparing this value to a confidence threshold, and mentally classifying the input as either certain or needing review based on the comparison. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 7, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 9 recites The labeling method of claim 8, wherein the determining of the first representative score value comprises: determining a 1*n first individual score vector by integrating elements of the first individual score matrix for each class; and determining a maximum value of the elements of the first individual score vector to be the first representative score value. Integrating scores for each class to obtain a score vector and then determining its maximum to be the representative score value can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by mentally summing the confidence scores from each model for each class, writing the sums out by hand as a vector on a sheet of paper, mentally identifying the maximum value of the sums, and mentally determining this maximum sum to be the representative score value. See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 8, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 10 recites The labeling method of claim 1, wherein the validation inputs and the labeling target inputs correspond to semiconductor images based on a semiconductor manufacturing process, and the classes of the neural network models correspond to types of manufacturing defects based on the semiconductor manufacturing process. Specifying the model inputs to be semiconductor images and the classes to correspond to manufacturing defects amounts to generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). Therefore, the claim does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claims 11-18 are device claims containing substantially the same elements as method claims 1-5, 7-8, and 10, and are rejected on the same grounds under 35 U.S.C. 101 as claims 1-5, 7-8, and 10, mutatis mutandis. 
The additional components of A labeling device comprising: one or more processors; and a memory storing instructions configured to, when executed by the one or more processors, cause the one or more processors to: are interpreted as a general-purpose computing environment and mere instructions to apply the judicial exception on the computer. Therefore, the claims do not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim 19:
Step 1: The claim is directed to a method, which falls within the statutory category of a
process.
Step 2A Prong 1: The claim is directed to an abstract idea. Specifically, the claim recites:
for each constituent NN, providing a respective score set comprising scores of the respective classes, wherein each constituent NN's score set comprises scores that are specific thereto, and wherein each score comprises a measure of inference performance of a corresponding constituent NN with respect to a corresponding class label; (Abstract idea – mental process. Providing a score set for a neural network measuring inference performance on class labels can practically be performed in the human mind or with the aid of pen and paper, for example, by mentally comparing the models’ predicted outputs to expected outputs for each class and mentally determining numerical indicators of their similarity. See MPEP 2106.04(a)(2)(III).)
assigning a class label, among the class labels, to the input data item by applying the score sets of the constituent NNs to the respective sets of prediction values of the constituent NNs. (Abstract idea – mental process. Applying score sets to prediction values to determine class labels can practically be performed in the human mind or with the aid of pen and paper, for example, by mentally multiplying each model’s prediction value for each class by the corresponding determined class score, and mentally assigning a class label to the input according to the class with the highest product. See MPEP 2106.04(a)(2)(III).)
Step 2A Prong 2: The additional elements recited in the claim do not integrate the abstract idea into a practical application, individually or in combination. Specifically, the claim recites the additional elements:
storing constituent neural networks (NNs) of an ensemble model, the constituent NNs each trained to infer a same set of class labels from input data items inputted thereto; (Storing trained classifier neural networks amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g).)
inputting an input data item to the constituent NNs and based thereon, the constituent NNs generate respective sets of prediction values, each set of prediction values comprising prediction values of the respective class labels for the corresponding constituent NN; (Generating class label predictions based on input data using generic neural networks is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Specifically, the claim recites the additional elements:
storing constituent neural networks (NNs) of an ensemble model, the constituent NNs each trained to infer a same set of class labels from input data items inputted thereto; (Storing trained classifier neural networks amounts to adding insignificant extra-solution activity to the judicial exception – see MPEP2106.05(g). Further, the limitation is directed to storing and retrieving information in memory, which the courts have found to be well-understood, routine, and conventional in the computer arts – see MPEP 2106.05(d).)
inputting an input data item to the constituent NNs and based thereon, the constituent NNs generate respective sets of prediction values, each set of prediction values comprising prediction values of the respective class labels for the corresponding constituent NN; (Generating class label predictions based on input data using generic neural networks is standard in the field of machine learning, and thus amounts to adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).)

Claim 20 recites The method of claim 19, wherein the scores are generated based on measures of model confidence and/or model consistency of the constituent NNs with respect to the class labels. Generating scores based on model confidence and consistency can practically be performed in the human mind or with the aid of pen and paper (i.e. mental process), for example, by mentally comparing the models’ predicted outputs to expected outputs and mentally determining a numerical indicator of their similarity, both aggregated across classes (confidence) and within each class (consistency). See MPEP 2106.04(a)(2)(III). Therefore, the claim merges with the abstract idea recited in claim 19, and does not recite additional elements that are sufficient to amount to significantly more than the abstract idea.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

	Claims 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by
Sturlaugson et al. (hereinafter Sturlaugson), U.S. Patent Application Publication US-20180346151-A1 (published 12/06/2018).

Regarding Claim 19,
Sturlaugson teaches A method comprising: 
storing constituent neural networks (NNs) of an ensemble model, the constituent NNs each trained to infer a same set of class labels from input data items inputted thereto; (0008: “An ensemble of related machine learning models is applied to the feature data. Each model is characterized by a false positive rate and a false negative rate…” 0061: “Candidate models and primary models 66 may be the result of supervised machine learning and/or guided machine learning… The underlying function may be… a classification algorithm… Examples of classification algorithms include… neural networks.” An ensemble of neural networks is stored and applied to feature data (i.e. input data items) to infer positive and negative labels (i.e. class labels).)
for each constituent NN, providing a respective score set comprising scores of the respective classes, wherein each constituent NN's score set comprises scores that are specific thereto, and wherein each score comprises a measure of inference performance of a corresponding constituent NN with respect to a corresponding class label; (0008: “An ensemble of related machine learning models is applied to the feature data. Each model is characterized by a false positive rate and a false negative rate…” 0055: “Each primary model 66 may be characterized by performance with training or verification inputs (e.g., using inputs with known results).” For each constituent model, false positive and false negative rates (i.e. scores measuring inference performance on each class label) are determined.)
inputting an input data item to the constituent NNs and based thereon, the constituent NNs generate respective sets of prediction values, each set of prediction values comprising prediction values of the respective class labels for the corresponding constituent NN; (0050: “Primary models 66 are selected and/or configured to transform extracted feature data 26 into an estimate of performance status by producing a positive classification score 76. The positive classification score 76, which may be referred to as the positive score, is a numeric value that relates to the positive outcome or result of the primary model 66. Primary models 66 also produce a negative classification score 78 that is complementary to the positive classification score 76. The negative classification score 78, which may be referred to as a negative score, is a numeric value that relates to the negative outcome or result of the primary model 66.” The primary models (i.e. constituent neural network models) generate positive and negative scores (i.e. prediction values for each class label) based on extracted feature data of an input data item.)
assigning a class label, among the class labels, to the input data item by applying the score sets of the constituent NNs to the respective sets of prediction values of the constituent NNs. (0066: “The positive weighting function and the negative weighting function may take the form of an inverse exponential function with an exponent proportional to the respective false positive rate (positive weighting function) or the false negative rate (negative weighting function). For example, the weighted positive score (the product of the positive weighting function and the positive classification score of a primary model) may be given by:                         
                            
                                    W
                                
                                    p
                                
                            =
                            P
                            
                                    e
                                
                                    -
                                    α
                                    F
                                    P
                                    R
                                
                     where                         
                            
                                    W
                                
                                    p
                                
                     is the weighted positive score,                         
                            P
                        
                     is the positive classification score,                         
                            F
                            P
                            R
                        
                     is the false positive rate, and                         
                            α
                        
                     is a constant… The weighted negative score (the product of the negative weighting function and the negative classification score of a primary model) may be given by:                         
                            
                                    W
                                
                                    N
                                
                            =
                            N
                            
                                    e
                                
                                    -
                                    β
                                    F
                                    N
                                    R
                                
                     where                         
                            
                                    W
                                
                                    N
                                
                     is the weighted negative score,                         
                            N
                        
                     is the negative classification score,                         
                            F
                            N
                            R
                        
                     is the false negative rate, and                         
                            β
                        
                     is a constant.” 0069: “The performance indicator 28 may be determined according to the combined weighted positive score and/or the combined weighted negative score… As a specific method of determining the performance indicator 28, the performance indicator 28 may be determined to be a positive category if the combined weighted positive score is greater than a threshold or a negative category if the combined weighted positive score is less than the same threshold.” The false positive and false negative rates (i.e. score sets) are applied to the positive and negative classification scores (i.e. prediction values) to generate weighted positive and negative scores, which are then combined and compared to a threshold to assign a performance indicator (i.e. class label) indicating a positive or negative category.)

Regarding Claim 20, Sturlaugson teaches The method of claim 19, as shown above.
Sturlaugson also teaches wherein the scores are generated based on measures of model confidence and/or model consistency of the constituent NNs with respect to the class labels. (0008: “An ensemble of related machine learning models is applied to the feature data. Each model is characterized by a false positive rate and a false negative rate…” 0055: “Each primary model 66 may be characterized by performance with training or verification inputs (e.g., using inputs with known results).” False positive and false negative rates are scores generated based on model classification consistency for each class.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1, 4, and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Sturlaugson in view of 
Dogan et al. (hereinafter Dogan), “A Weighted Majority Voting Ensemble Approach for Classification” (published 11/21/2019). 

	Regarding Claim 1,
Sturlaugson teaches A labeling method comprising: 
determining inference performance features of respective neural network models comprised in an ensemble model, wherein the inference performance traits correspond to performance of the neural network models with respect to inferring classes of the ensemble model; (0008: “An ensemble of related machine learning models is applied to the feature data. Each model is characterized by a false positive rate and a false negative rate…” 0055: “Each primary model 66 may be characterized by performance with training or verification inputs (e.g., using inputs with known results).” 0061: “Candidate models and primary models 66 may be the result of supervised machine learning and/or guided machine learning… The underlying function may be… a classification algorithm… Examples of classification algorithms include… neural networks.” False positive and false negative rates (i.e. inference performance features) are determined for each neural network of an ensemble of neural networks based on classification performance. ‘Positive’ represents a first class, and ‘negative’ represents a second class.)
based on the inference performance features, determining weights for each of the classes for each of the neural network models, wherein the weights are not weight of nodes of the neural network models; (0066: “The positive weighting function and the negative weighting function may take the form of an inverse exponential function with an exponent proportional to the respective false positive rate (positive weighting function) or the false negative rate (negative weighting function). For example, the weighted positive score (the product of the positive weighting function and the positive classification score of a primary model) may be given by:                         
                            
                                    W
                                
                                    p
                                
                            =
                            P
                            
                                    e
                                
                                    -
                                    α
                                    F
                                    P
                                    R
                                
                     where                         
                            
                                    W
                                
                                    p
                                
                     is the weighted positive score,                         
                            P
                        
                     is the positive classification score,                         
                            F
                            P
                            R
                        
                     is the false positive rate, and                         
                            α
                        
                     is a constant… The weighted negative score (the product of the negative weighting function and the negative classification score of a primary model) may be given by:                         
                            
                                    W
                                
                                    N
                                
                            =
                            N
                            
                                    e
                                
                                    -
                                    β
                                    F
                                    N
                                    R
                                
                     where                         
                            
                                    W
                                
                                    N
                                
                     is the weighted negative score,                         
                            N
                        
                     is the negative classification score,                         
                            F
                            N
                            R
                        
                     is the false negative rate, and                         
                            β
                        
                     is a constant.” For each model, a positive weight                         
                            
                                    e
                                
                                    -
                                    α
                                    F
                                    P
                                    R
                                
                     is determined based on the false positive rate, and a negative weight                         
                            
                                    e
                                
                                    -
                                    β
                                    F
                                    N
                                    R
                                
                     is determined based on the false negative rate (i.e. a weight is determined for each class based on the inference performance features).)
generating classification result data by performing a classification inference operation on labeling target inputs by the neural network models; (0050: “Primary models 66 are selected and/or configured to transform extracted feature data 26 into an estimate of performance status by producing a positive classification score 76. The positive classification score 76, which may be referred to as the positive score, is a numeric value that relates to the positive outcome or result of the primary model 66. Primary models 66 also produce a negative classification score 78 that is complementary to the positive classification score 76. The negative classification score 78, which may be referred to as a negative score, is a numeric value that relates to the negative outcome or result of the primary model 66.” The primary models (i.e. neural network models) transform extracted feature data (i.e. labeling target inputs) into positive and negative classification scores (i.e. classification result data).)
determining score data representing confidences for each of the classes for the labeling target inputs by applying weights of the weight data to the classification result data; and (See the portion of 0066 cited above. Weighted positive and negative scores                         
                            
                                    W
                                
                                    p
                                
                     and                         
                            
                                    W
                                
                                    N
                                
                     (i.e. score data representing confidences for each class) are determined by the product of the positive and negative weighting functions and the positive and negative classification scores (i.e. by applying the weight data to the classification result data).)
Sturlaugson does not appear to explicitly disclose measuring classification accuracy of the classification operation for the labeling target inputs based on the score data.
However, Dogan teaches measuring classification accuracy of the classification operation for the labeling target inputs based on the score data. (Pg. 369, section IV.B: “In this study, classification accuracy was used as evaluation metric to compare the performances of the methods.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sturlaugson and Dogan. Sturlaugson teaches a model ensemble with per-class per-model confidence weighting based on base model performance. Dogan teaches a model ensemble with weighted voting based on base model performance, including measuring ensemble classification accuracy. One of ordinary skill would have motivation to combine Sturlaugson and Dogan in order to evaluate the performance of the ensemble model.

Regarding Claim 4, Sturlaugson and Dogan teach The labeling method of claim 1, as shown above.
Sturlaugson also teaches wherein the measuring of the classification accuracy comprises: 
determining representative score values based on the labeling target inputs based on the score data; and (0068: “The weighted positive scores of the primary models 66 may be combined by summing (and/or averaging, etc.) the individual weighted positive scores for all (or a subset, e.g., a filtered subset) of the primary models 66… The weighted negative scores of the primary models 66 may be combined in a manner analogous to the weighted positive scores.” Combined weighted positive and negative scores (i.e. representative score values) are determined based on the weighted positive and negative scores (i.e. score data).)
classifying each of the labeling target inputs into a first group or a second group based on the representative score values. (0069: “The performance indicator 28 may be determined according to the combined weighted positive score and/or the combined weighted negative score… As a specific method of determining the performance indicator 28, the performance indicator 28 may be determined to be a positive category if the combined weighted positive score is greater than a threshold or a negative category if the combined weighted positive score is less than the same threshold.” Each input is assigned a performance indicator representing a positive or negative classification (i.e. classification into a first or second group) based on its combined weighted positive and negative scores (i.e. representative score values).)

Regarding Claim 5, Sturlaugson and Dogan teach The labeling method of claim 4, as shown above.
Sturlaugson also teaches wherein the score data comprises individual score data of each of the labeling target inputs, and (0008-0009: “Feature data is extracted from the flight data. The feature data relates to performance of one or more components of the aircraft. An ensemble of related machine learning models is applied to the feature data… Each model produces a positive score and a complementary negative score related to performance of the selected component.” Positive and negative scores (i.e. score data) are produced individually based on the feature data associated with each selected component (i.e. labeling target input).)
Dogan teaches the determining of the representative score values comprises determining a maximum value of each piece of individual score data of the labeling target inputs of the score data to be the representative score value of each of the labeling target inputs. (Pg. 366, section I: “Each classifier is associated with a coefficient (weight), usually proportional to its classification performance on a validation set. The final decision is made by summing up all weighted votes and by selecting the class with the highest aggregate.” For an input, the final decision (i.e. representative score value) of the classifier is the class with the highest aggregate (i.e. the maximum value of the individual score data).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sturlaugson and Dogan. Sturlaugson teaches a model ensemble with per-class per-model confidence weighting based on base model performance. Dogan teaches a voting procedure for a model ensemble with weighted voting based on base model performance. One of ordinary skill would have motivation to combine Sturlaugson and Dogan in order to extend the binary classification capabilities of Sturlaugson’s ensemble to work with an arbitrary number of classes by providing a weighted voting procedure which “generally produce[s] better classification results than both simple majority voting ensemble (SMVE) approach and individual standard classification algorithms in terms of accuracy” (Dogan, pg. 366, section I).

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Sturlaugson in view of Dogan, and further in view of 
Zhang et al. (hereinafter Zhang), “MEMO: Test Time Robustness via Adaptation and Augmentation” (published 10/18/2021).

Regarding Claim 2, Sturlaugson and Dogan teach The labeling method of claim 1, as shown above.
	Sturlaugson also teaches further comprising: 
generating validation result data by performing a classification operation on validation inputs by the neural network models; (0055: “Each primary model 66 may be characterized by performance with training or verification inputs (e.g., using inputs with known results). The model outcomes may be categorized as true positive outcomes (the predicted value of the model is positive and the known result is also positive), true negative outcomes (the predicted value is negative and the known result is negative), false positive outcomes (the predicted value is positive and the known result is negative), or false negative outcomes (the predicted value is negative and the known result is positive).” Each model performs classification on verification inputs (i.e. validation inputs) to generate predicted values (i.e. validation result data).)
Sturlaugson and Dogan do not appear to explicitly disclose
generating first partial data of the validation result data by performing a first classification operation on the validation inputs by the neural network models; 
generating additional validation inputs by transforming the validation inputs; and 
generating second partial data of the validation result data by performing a second classification operation on the additional validation inputs by the neural network models.
However, Zhang teaches generating first partial data of the validation result data by performing a first classification operation on the validation inputs by the neural network models; (Pg. 4, section 3.1: “After this step, we use the adapted model                         
                            
                                    f
                                
                                    θ
                                    '
                                
                     to predict on the original test input                         
                            x
                        
                     (line 4).” The model makes a prediction (i.e. generates first partial validation result data) based on the test input (i.e. validation input).)
generating additional validation inputs by transforming the validation inputs; and (pg. 4, section 3.1: “Given a test point                         
                            x
                        
                     and set of augmentation functions                         
                            A
                        
                    , we sample                         
                            B
                        
                     augmentations from                         
                            A
                        
                     and apply them to                         
                            x
                        
                     in order to produce a batch of augmented data                         
                            
                                            x
                                        
                                        ~
                                    
                                    1
                                
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                            
                                            x
                                        
                                        ~
                                    
                                    B
                                
                    .” The test point (i.e. validation input) is augmented (i.e. transformed) to produce a batch of augmented data (i.e. additional validation inputs).)
generating second partial data of the validation result data by performing a second classification operation on the additional validation inputs by the neural network models. (Pg. 4, section 3.1: “The model’s average, or marginal, output distribution with respect to the augmented points is given…” The model makes predictions (i.e. generates second partial validation result data) based on the augmented test input (i.e. additional validation inputs).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sturlaugson, Dogan, and Zhang. Sturlaugson teaches a model ensemble with per-class per-model confidence weighting based on base model performance. Dogan teaches a voting procedure for a model ensemble with weighted voting based on base model performance. Zhang teaches test-time data augmentation for model adaptation and robustification. One of ordinary skill would have motivation to combine Sturlaugson, Dogan, and Zhang because Zhang’s ‘MEMO’ method “does not require access or changes to the model training procedure and is thus broadly applicable for a wide range of model architectures pretrained in a number of different ways. Furthermore, MEMO adapts at test time using single test inputs, thus it does not assume access to multiple test points as in several recent methods for test time adaptation [40, 46]. On a range of CIFAR-10 and ImageNet distribution shift benchmarks, and for ResNet, vision transformer, and, to an extent, ResNext models, MEMO consistently improves performance at test time and achieves several new state-of-the-art results for these models in the single test point setting” (Zhang, pg. 10, section 5).

Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Sturlaugson in view of Dogan, and further in view of 
Ren et al. (hereinafter Ren), “Multi-classifier ensemble based on dynamic weights” (published 12/30/2017).

Regarding Claim 3, Sturlaugson and Dogan teach The labeling method of claim 1, as shown above.
Sturlaugson also teaches wherein the determining of the weight data comprises: 
determining […] consistency data indicating classification consistency for each class of the neural network models, based on a comparison result between labels of the validation inputs and the validation result data; (0008: “An ensemble of related machine learning models is applied to the feature data. Each model is characterized by a false positive rate and a false negative rate…” 0055: “Each primary model 66 may be characterized by performance with training or verification inputs (e.g., using inputs with known results).” False positive and false negative rates (i.e. consistency data indicating classification consistency for each class) are determined based on performance on verification inputs with known results (i.e. based on a comparison between labels of validation inputs and validation result data).)
Sturlaugson and Dogan do not appear to explicitly disclose
determining model confidence data indicating model confidence of the neural network models […]; and 
determining the weight data by integrating the model confidence data and the consistency data.
However, Ren teaches determining model confidence data indicating model confidence of the neural network models and consistency data indicating classification consistency for each class of the neural network models, based on a comparison result between labels of the validation inputs and the validation result data; and (Pg. 21085. Section 1: “The reliability of the classifier is defined on the basis of its recognition capability, which is determined by the prior knowledge gained during the training process… The credibility of the classifier is obtained by calculating the posterior probability distribution, which is used to determine the separability feature of the classifier.” For each classifier model, reliability (i.e. confidence data) is determined based on knowledge gained in the training process (i.e. based on validation data), and credibility (i.e. consistency data) is determined based on class separability (i.e. indicating classification consistency for each class).)
determining the weight data by integrating the model confidence data and the consistency data. (Pg. 21083, Abstract: “The algorithm defines decision credibility to describe the real-time importance of the classifier to the current target, combines this credibility with the reliability calculated by the classifier on the training data set and dynamically assigns the fusion weight to the classifier.” Fusion weights (i.e. weight data) are determined by combining reliability and credibility (i.e. integrating confidence data and consistency data).)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sturlaugson, Dogan, and Ren. Sturlaugson teaches a model ensemble with per-class per-model confidence weighting based on base model performance. Dogan teaches a voting procedure for a model ensemble with weighted voting based on base model performance. Ren teaches a model ensemble which calculates weights based on an integration of two model metrics: reliability and credibility. One of ordinary skill would have motivation to combine Sturlaugson, Dogan, and Ren because “[c]ompared with single classifier and current popular fusion methods, our method can more effectively reduce the effect of unreliable instance information in the training phase of the classifier and can therefore fuse the decisions in classification efficiently and improve the overall performance of the integrated method” (Ren, pg. 21085, section 1).

Claims 6-9 are rejected under 35 U.S.C. 103 as being unpatentable over Sturlaugson in view of Dogan, and further in view of 
Saha et al. (hereinafter Saha), “Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition” (published 07/20/2012).

Regarding Claim 6, Sturlaugson and Dogan teach The labeling method of claim 1, as shown above.
Sturlaugson and Dogan do not appear to explicitly disclose wherein the weight data corresponds to an m*n weight matrix, where m is the number of the neural network models, and where n is the number of classes. 
However, Saha teaches wherein the weight data corresponds to an m*n weight matrix, where m is the number of the neural network models, and where n is the number of classes. (Pg. 19, section 3.1: “Let, the                         
                            N
                        
                     number of available classifiers be denoted by                         
                            
                                    C
                                
                                    1
                                
                            ,
                            …
                            ,
                            
                                    C
                                
                                    N
                                
                     and                         
                            A
                            =
                            {
                            
                                    C
                                
                                    i
                                
                            :
                            i
                            =
                            1
                            ;
                            N
                            }
                        
                    . Suppose, there are                         
                            M
                        
                     number of output classes. The classifier ensemble problem is then stated as follows: Find the combination of votes                         
                            V
                        
                     per classifier                         
                            
                                    C
                                
                                    i
                                
                     which will optimize a function                         
                            F
                            (
                            V
                            )
                        
                    . Here,                         
                            V
                        
                     can be either a Boolean array (binary vote based ensemble) of size                         
                            N
                            ×
                            M
                        
                     or a real array (real/weighted vote based ensemble) of size                         
                            N
                            ×
                            M
                        
                    … In the case of real array:                         
                            V
                            (
                            i
                            ,
                            j
                            )
                        
                     denotes the weight of vote of the                         
                            
                                    i
                                
                                    t
                                    h
                                
                     classifier for the                         
                            
                                    j
                                
                                    t
                                    h
                                
                     class.” Vote array                         
                            V
                        
                     (i.e. weight data) is an                         
                            N
                            ×
                            M
                        
                     weight matrix, where                         
                            N
                        
                     is the number of classifiers (i.e. neural network models) and                         
                            M
                        
                     is the number of classes.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sturlaugson, Dogan, and Saha. Sturlaugson teaches a model ensemble with per-class per-model confidence weighting based on base model performance. Dogan teaches a voting procedure for a model ensemble with weighted voting based on base model performance. Saha teaches a model ensemble with per-class per-model confidence weighting, including storing the weight values in an array. One of ordinary skill would have motivation to combine Sturlaugson, Dogan, and Saha in order to efficiently store and access weight values.

Regarding Claim 7, Sturlaugson, Dogan, and Saha teach The labeling method of claim 6, as shown above.
Sturlaugson also teaches wherein the generating of the classification result data comprises generating a first classification result matrix based on a first labeling target input of the labeling target inputs, and (0050: “Primary models 66 are selected and/or configured to transform extracted feature data 26 into an estimate of performance status by producing a positive classification score 76. The positive classification score 76, which may be referred to as the positive score, is a numeric value that relates to the positive outcome or result of the primary model 66. Primary models 66 also produce a negative classification score 78 that is complementary to the positive classification score 76. The negative classification score 78, which may be referred to as a negative score, is a numeric value that relates to the negative outcome or result of the primary model 66.” The primary models each transform extracted feature data (i.e. a first labeling target input) into positive and negative classification scores (i.e. classification result data), resulting in a score for each class for each model (i.e. a first classification result matrix).)
the determining of the score data comprises determining an m*n first individual score matrix by applying the weight matrix to the first classification result matrix. (0090: “Weighting models 122 includes weighting the positive classification score and/or the negative classification score of each model of the ensemble according to the individual performance measures of the models to respectively produce a weighted positive score and/or a weighted negative score.” Weighted positive and negative scores (i.e. score data) are produced for each model by applying the performance measure-based weights (i.e. weight matrix) to the positive and negative classification scores (i.e. first classification result matrix), resulting in a weighted score for each class for each model (i.e. an m*n first individual score matrix).)

Regarding Claim 8, Sturlaugson, Dogan, and Saha teach The labeling method of claim 7, as shown above.
Sturlaugson also teaches wherein the measuring of the classification accuracy comprises: 
determining a first representative score value for the first labeling target input from the first individual score matrix; and (0068: “The weighted positive scores of the primary models 66 may be combined by summing (and/or averaging, etc.) the individual weighted positive scores for all (or a subset, e.g., a filtered subset) of the primary models 66.” A combined weighted positive score (i.e. a first representative score value) is determined for the input (i.e. first labeling target input) based on the weighted positive scores (i.e. first individual score matrix).)
classifying the first labeling target input into a certain label group or a review group based on the first representative score value. (0069: “The performance indicator 28 may be determined according to the combined weighted positive score… the positive category may be assigned to values greater than a first threshold and the negative category may be assigned to values less than a second threshold. Values between the first and second threshold may be assigned an unclassified or undetermined category.” Based on the combined weighted positive score (i.e. first representative score value), the input is classified as positive or negative (i.e. a certain label group), or as undetermined (i.e. a review group).)

Regarding Claim 9, Sturlaugson, Dogan, and Saha teach The labeling method of claim 8, as shown above.
Sturlaugson also teaches wherein the determining of the first representative score value comprises: 
determining a 1*n first individual score vector by integrating elements of the first individual score matrix for each class; and (0068: “The weighted positive scores of the primary models 66 may be combined by summing (and/or averaging, etc.) the individual weighted positive scores for all (or a subset, e.g., a filtered subset) of the primary models 66… The weighted negative scores of the primary models 66 may be combined in a manner analogous to the weighted positive scores.” The weighted positive and negative scores are combined (i.e. elements of the score matrix are integrated for each class), resulting in a combined weighted positive and negative score (i.e. a 1*n first individual score vector).)
Dogan teaches determining a maximum value of the elements of the first individual score vector to be the first representative score value. (Pg. 366, section I: “Each classifier is associated with a coefficient (weight), usually proportional to its classification performance on a validation set. The final decision is made by summing up all weighted votes and by selecting the class with the highest aggregate.” For an input, the final decision (i.e. first representative score value) of the classifier is the class with the highest aggregate (i.e. the maximum value of the first individual score vector).)

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Sturlaugson in view of Dogan and Saha, and further in view of 
Saqlain et al. (hereinafter Saqlain), “A Voting Ensemble Classifier for Wafer Map Defect Patterns Identification in Semiconductor Manufacturing” (published 03/11/2019).

Regarding Claim 10, Sturlaugson and Dogan teach The labeling method of claim 1, as shown above.
Sturlaugson and Dogan do not appear to explicitly disclose wherein the validation inputs and the labeling target inputs correspond to semiconductor images based on a semiconductor manufacturing process, and the classes of the neural network models correspond to types of manufacturing defects based on the semiconductor manufacturing process.
However, Saqlain teaches wherein the validation inputs and the labeling target inputs correspond to semiconductor images based on a semiconductor manufacturing process, and (Pg. 171-172, section I: “A wafer map (WM) is a collection of visual data about the physical parameter that are collected from semiconductor wafers… this paper proposes a voting ensemble classifier with multi-types features to identify wafer map defect patterns in semiconductor manufacturing.” Pg. 179, section V.A: “We used a real-world dataset WM-811K collected from 46,293 lots and consist of 811,457 original wafer images.”)
the classes of the neural network models correspond to types of manufacturing defects based on the semiconductor manufacturing process. (Pg. 174, section III.B: “The labeled dataset also has two major classes such as pattern class and no-pattern class. The no-pattern class has no specific defect pattern of WM and labeled as None. It contains 147,431 wafer entities (85.2%) of the whole labeled data set. In addition, pattern class has only 25,519 wafer entities (14.8%) and consists of eight actual defect classes, labeled as Center, Donut, Edge-local, Edge-ring, Local, Random, Scratch, and Near-full.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Sturlaugson, Dogan, and Saqlain. Sturlaugson teaches a model ensemble with per-class per-model confidence weighting based on base model performance. Dogan teaches a voting procedure for a model ensemble with weighted voting based on base model performance. Saqlain teaches an ensemble model for detecting and classifying defects in semiconductor wafer manufacturing. One of ordinary skill would have motivation to combine Sturlaugson, Dogan, and Saqlain because “the research methodologies about WM defects detection and recognition of spatial patterns are in high demand. These methods can be further used for early prevention of defects by diagnosing their root causes and to enhance the product quality by improving the reliability of the manufacturing system” (Saqlain, pg. 171, section I). Sturlaugson and Dogan teach improvements to ensemble learning, and Saqlain shows that ensemble learning is well-suited for wafer defect detection: “the SVE [soft voting ensemble] classifier outperformed all individual classification algorithms as well as previously proposed wafer map failure pattern recognition (WMFPR) [4] method and CNN model” (Saqlain, pg. 172, section I).

Claims 11-18 are device claims containing substantially the same elements as method claims 1-5, 7-8, and 10. Sturlaugson, Dogan, Zhang, Ren, Saha, and Saqlain teach the elements of claims 1-5, 7-8, and 10, as shown above.
Sturlaugson also teaches A labeling device comprising: one or more processors; and a memory storing instructions configured to, when executed by the one or more processors, cause the one or more processors to: (0026: “As illustrated in FIG. 1, a predictive maintenance system 10 includes a computerized system 200 (as further discussed with respect to FIG. 7). The predictive maintenance system 10 may be programmed to perform, and/or may store instructions to perform, the methods described herein.” 0097: “The computerized system 200 includes a processing unit 202 operatively coupled to a computer-readable memory 206 by a communications infrastructure 210. The processing unit 202 may include one or more computer processors 204…”)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN M ROHD whose telephone number is (571)272-6445. The examiner can normally be reached Mon-Thurs 8:00-6:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/B.M.R./Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
METHOD AND DEVICE WITH ENSEMBLE MODEL FOR DATA LABELING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND DEVICE WITH ENSEMBLE MODEL FOR DATA LABELING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email