Last updated: May 29, 2026

Application No. 17/400,016

METHOD AND SYSTEM FOR LEARNING AN ENSEMBLE OF NEURAL NETWORK KERNEL CLASSIFIERS BASED ON PARTITIONS OF THE TRAINING DATA

Final Rejection §103

Filed

Aug 11, 2021

Examiner

PHUNG, QUOC LY PHU

Art Unit

2143

Tech Center

2100 — Computer Architecture & Software

Assignee

Palo Alto Research Center Incorporated

OA Round

4 (Final)

Interview Optional

— +93.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 46% grant rate with +93.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 26 resolved cases, 2023–2026

Examiner Intelligence

PHUNG, QUOC LY PHU View full profile →

Grants 46% of resolved cases

Career Allowance Rate

12 granted / 26 resolved

-8.8% vs TC avg

Strong +93% interview lift

Without

With

+93.3%

Interview Lift

resolved cases with interview

Typical timeline

4y 2m

Avg Prosecution

6 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

3.5%

-36.5% vs TC avg

§103

86.8%

+46.8% vs TC avg

§102

4.4%

-35.6% vs TC avg

§112

5.3%

-34.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 26 resolved cases

Office Action

§103

DETAILED ACTION

Remarks
	Claims 1-20 have been examined and rejected. This Office Action is responsive to the amendment filed on 12/29/2025, which has been entered in the above identified application.


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination.


Respond to Amendment
Applicant’s amendment filed 12/29/2025 has been entered. Claims 1, 10 and 18 have been amended. Claims 1-20 are pending in the application.






Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 7-13 and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Baker et al (US 20200410090 A1) hereafter Baker, in view of Li et al (US 20190068171 A1) hereafter Li, further in view of Coelho et al (US 20210051322 A1) hereafter Coelho, and further in view of Honda et al (US 12038802 B2) hereafter Honda.

With respect to claim 1, Baker teaches a computer-executable method for facilitating construction of an ensemble of neural network kernel classifiers (a computer implemented method provided for building and training an ensemble of machine learning systems to be robust against adversarial attacks [par. 0003]), the method comprising: 
dividing a training set of data objects into a number of partitions (an ensemble method called “blasting” is used to create ensembles with many ensemble members those are trained on sets of training data, where the training data subsets are selected to increase diversity among the members [par. 0018]); 
training, based on the training set of data objects, a first neural network encoder to output a first set of features (a set of layer nodes may be selected to represent features of the neural network training. At step 607A, the system trains the ensemble members by supervised learning based on the training data selected [par. 0135, 0159 and FIG. 8]); 
training, based on each respective partition of the training set of data objects, a second neural network encoder to output a second set of features (a single ensemble member is trained with a secondary objective to match a target input sensitivity value for each input variable for each training data item [par. 0020 and FIG. 1A]); and 
predicting a result for a testing data object based on the ensemble of neural network kernel classifiers (the system measures the performance as well as the degree of diversity of the ensemble subsets to generate a final result that is robust to adversarial attacks. The process may be stopped if a specified number of ensemble subsets have been accepted as operational subsets [par. 0062, 0063 and FIGS. 1]).

However, Baker does not disclose generating, for each respective partition, kernel models which output a third set of features based on the first set of features output from training the first neural network encoder based on the training set of data objects and further based on the second set of features output from training the second neural network encoder based on each respective partition of the training set of data objects; classifying, by a classification model, the training set of data objects based on the third set of features, wherein the generated kernel models for each respective partition and the classification model comprise the ensemble of neural network kernel classifiers; displaying, on a device associated with a user, information associated with a respective neural network kernel classifier and indicating one or more of a kernel type, the number of partitions, or a data-partitioning method used by the respective neural network kernel classifier; and interactive elements which allow a user to perform an action associated with the displayed information, wherein the action comprises a change to a configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method; receiving, based on an interactive element activated on the device associated with the user, the change to the configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method; and predicting a new result for the testing data object by updating the ensemble of neural network kernel classifiers in response to receiving the change to the configuration.
In the same field of endeavor, Li teaches classifying, by a classification model, the training set of data objects based on the third set of features (NICE and KLMS algorithms may be used for classification and regression. For example, NICE organizes new data points into existing clusters, and a minimum centroid distance is computed for each subsequent data point [par. 0037-0042]), 
wherein the generated kernel models for each respective partition and the classification model comprise the ensemble of neural network kernel classifiers (NICE and KLMS can be viewed as a single, a multiple or a mixture filter algorithm among all kernel filters [par. 0033]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the concept of filtering out noise or irrelevant signal features by using filter models as suggested by Li into the concept of building an ensemble to be robust against adversarial attacks as suggested by Baker because both of the systems addressing the process of training neural network classifiers to improve the performance of the kernel filtering framework such as classification and regression. Doing so would be desirable because the invention of Baker should explicitly teach a process of generating a respective machine learning model for each respective partition using the universal kernel function to combine all the respective partitions to obtain an ensemble ML model, such as the filters acted as models used to clean the signal from noise or irrelevant signal features that the users want to avoid (Li, [par. 0003]).

However, the combination of Baker and Li does not disclose generating, for each respective partition, kernel models which output a third set of features based on the first set of features output from training the first neural network encoder based on the training set of data objects and further based on the second set of features output from training the second neural network encoder based on each respective partition of the training set of data objects; displaying, on a device associated with a user, information associated with a respective neural network kernel classifier and indicating one or more of a kernel type, the number of partitions, or a data-partitioning method used by the respective neural network kernel classifier; and interactive elements which allow a user to perform an action associated with the displayed information, wherein the action comprises a change to a configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method; receiving, based on an interactive element activated on the device associated with the user, the change to the configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method; and predicting a new result for the testing data object by updating the ensemble of neural network kernel classifiers in response to receiving the change to the configuration.
In the same field of endeavor, Coelho teaches generating, for each respective partition, kernel models which output a third set of features based on the first set of features output from training the first neural network encoder based on the training set of data objects and further based on the second set of features output from training the second neural network encoder based on each respective partition of the training set of data objects (the process 1100 trains a machine learning model using input data in the encoding process (first encoder). The process uses the trained ML model to infer a mode decision for an image block, which is to be encoded using a quantization parameter. Each training datum of the training data 1112 can include a video block that was encoded by traditional encoding methods (a second encoder) corresponding to a quantization parameter used by the second encoder. Zero or more additional inputs corresponding to inputs used by the second encoder in determining the mode decision for encoding the video block can include at least some of the first samples of the top neighboring block, at least some of the second samples of the left block of the input. During the training, the ML model learns a mapping that accepts, as input, a block and a non-linear value of a quantization parameter and output a partitioning of the block [par. 0161-0170 and FIG. 11]);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the concept of encoding a block of picture includes a CNN for determining a block partitioning of the block as suggested by Coelho into the combination of Baker and Li because all of these systems addressing the process of training neural network classifiers to improve the performance of the partitioning process in filtering the input data (images) to obtain output features. Doing so would be desirable because the combination of Baker and Li would be efficient by using CNN to partition decisions for a block of picture, encoding and decoding the block where each feature map of the feature maps is of a smallest possible partition size of the block (Coelho, [par. 0004-0007]).

However, the combination of Baker, Li and Coelho does not disclose displaying, on a device associated with a user, information associated with a respective neural network kernel classifier and indicating one or more of a kernel type, the number of partitions, or a data-partitioning method used by the respective neural network kernel classifier; and interactive elements which allow a user to perform an action associated with the displayed information, wherein the action comprises a change to a configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method; receiving, based on an interactive element activated on the device associated with the user, the change to the configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method; and predicting a new result for the testing data object by updating the ensemble of neural network kernel classifiers in response to receiving the change to the configuration.
In the same field of endeavor, Honda teaches displaying, on a device associated with a user (a process for fabricating semiconductor wafers that includes a user interface generated to contain and to display for user review wafer information including the results of wafer inspections [col. 1, lines 20-50]), 
information associated with a respective neural network kernel classifier and indicating one or more of a kernel type, the number of partitions, or a data-partitioning method used by the respective neural network kernel classifier (any typical classifier may be used with or without hyper-parameter tuning including K-nearest neighbors, Robust logic Regression, Naïve Bayes, and other neural networks, linear & nonlinear SVM, Random Forest. These classifiers can be used collectively depending on computational and accuracy requirements. Hyper-parameter is performed to make that the results do not overfit the data. A data-partitioning method can be cross-validation, bagging or clustering. In step 410, n-fold cross validation can be used with the labels to identify best hyper-parameter to fit to the selected model list in the dataset [col. 5, lines 30-45; col. 9, lines 10-15]); and
interactive elements which allow a user to perform an action associated with the displayed information, wherein the action comprises a change to a configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method (a simplified GUI may be used to implement a collaborative learning environment for classifying wafers. The GUI may allow the user to toggle key information from a pull-down menu or equivalent. One or more sets of user controls are provided to navigate around the display, such as buttons, menus, and other widgets [col. 2, lines 30-40; col. 3, lines 10-20; col. 7, lines 50-55; col. 9, lines 35-65]);
receiving, based on an interactive element activated on the device associated with the user, the change to the configuration related to one or more of the kernel type, the number of partitions, or the data-partitioning method (The results from the ML model can be displayed in GUI for user review, and the user can select one or more wafers or lots to review. The model is provided with the ability to record changes, and the model can connect those changes to specific users [col. 4, lines 1-25; col. 6, lines 20-25]); and
predicting a new result for the testing data object by updating the ensemble of neural network kernel classifiers in response to receiving the change to the configuration (A permission scheme may be implemented to write out model objects to disk as well as to return updated classification labels and tables to the database and GUI. A various rule-based and machine learning-based model can be updated and continuously retrained with the reclassification data to enhance the effectiveness of classification schemes [col. 6, lines 55-65; col. 9, lines 25-35]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the concept of classifying wafers using collaborative learning that may predict a wafer classification by a ML model as suggested by Honda into the combination of Baker, Li and Coelho because all of these systems addressing the process of training neural network/ML model to improve the performance of the neural network/the ML model based on the input data. Doing so would be desirable because the combination of Baker, Li and Coelho would be more efficient by supporting the training of neural network/ML model with a graphical user interface (GUI) to display the wafer information for user to review and to make predictions, and the users may modify, correct, or update wafer quality based on the results (Honda, [col. 1, line 20 – col. 2, line 10]).

With respect to claim 2, the combination of Baker, Li, Coelho and Honda teaches wherein dividing the training set of data objects into the number of partitions comprises dividing the training set of data objects into a number of classes based on a respective class associated with a respective data object (Baker, the computer system partitions the training data into disjoint subsets. The system also trains a classifier to classify the training data items of the initial set into various classification categories. Each training data item may be associated with the input variables in the classifier [par. 0115, 0131]).

With respect to claim 3, the combination of Baker, Li, Coelho and Honda teaches wherein dividing the training set of data objects into the number of partitions comprises dividing the training set of data objects randomly into the number of partitions (Baker, the computer system selects a target vector for each selected subset of ensemble members, whereas each target vector value is chosen randomly for each ensemble member ranging from -1 to 1. Other random arrangements may also be considered for these values [par. 0031 and FIG. 1B]).

With respect to claim 4, the combination of Baker, Li, Coelho and Honda teaches wherein a respective kernel model comprises one or more of: a Gaussian kernel; a universal kernel in a Reproducing Kernel Hilbert Space; a linear kernel; a kernel mapping; and a kernel with a corresponding closed-form mathematical expression (Li, systems and methods are related to automatically composing universal filters. Some kernel methods such as Gaussian process (GP) or reproducing kernel Hilbert space (RKHS) [par. 0019-0021]).

With respect to claim 5, the combination of Baker, Li, Coelho and Honda teaches wherein the classification model comprises one or more of: a linear classifier; a logistic regression classifier; and a multiple-class classifier (Li, kernel adaptive filtering (KAF) removes the limitations of the linear model to provide a general nonlinear solution. NICE and KLMS are the two among all kernel methods that create unifying framework for classification and regression. NICE-KLMS can also be considered as multiple filtering algorithm [par. 0019-0021]).  

With respect to claim 7, the combination of Baker, Li, Coelho and Honda teaches wherein the first neural network encoder, a respective second neural network encoder trained based on a respective partition (Baker, a step of training an ensemble member on a training subset will cause the ensemble member to be trained in a separate direction compared to other ensemble members [par. 0020, 0120]), a respective kernel model generated for the respective partition (Li, NICE and KLMS algorithms may be used to partition training samples into clusters, where each respective partition may be associated with a respective model such as Gaussian model [par. 0074]), and a classification model comprise a combined neural network kernel model which is based on parameters (Li, NICE may be provided a parameter that relates to the kernel bandwidth when combined with other models such as QKLMS or Gaussian. Hence, NICE and KLMS algorithms may incorporate each other to leverage filter parameters [par. 0034, 0071, 0074]).  

With respect to claim 8, the combination of Baker, Li, Coelho and Honda teaches further comprising: determining a forward iteration, wherein an input of the combined neural network kernel model comprises the training set of data objects and data objects in the respective partition (Li, NICE and KLMS partitions training sample into clusters those have an additional parameter with respect to the Gaussian kernel width. One cluster filter is used per input sample, and the selection process takes roughly three distance comparisons with the respective centroids [par. 0020, 0074]); and defining a back propagation iteration, wherein known labels of the training set of data objects enable the combined neural network kernel model to change one or more parameters to ensure that the classification of the training set of data objects is consistent with the known labels (Baker, the system may measure the sensitivity of the system to a change in an input variable for a training item by measuring the amount of change in a designated differentiable function of the vector of the output values. The system may back propagate partial derivatives in the back propagation computation to generate primary objective [par. 0004, 0022, 0033, 0034]).

With respect to claim 9, the combination of Baker, Li, Coelho and Honda teaches wherein the testing data object is modified based on an adversarial technique (Baker, the computer system is trained to generate ensembles to be robust against adversarial attacks, such that the subsets selected from the base ensembles have diverse responses to adversarial attacks [par. 0003, 0016, 0053 and FIG. 3]).  

With respect to claim 10, it is a computer system claim that corresponding to the computer method of claim 1. Therefore, it is rejected for the same reason as claimed in claim 1 above.

With respect to claim 11, it is a computer system claim that corresponding to the computer method of claim 2. Therefore, it is rejected for the same reason as claimed in claim 2 above.

With respect to claim 12, it is a computer system claim that corresponding to the computer method of claim 3. Therefore, it is rejected for the same reason as claimed in claim 3 above.

With respect to claim 13, it is a computer system claim that corresponding to the computer method of claim 4. Therefore, it is rejected for the same reason as claimed in claim 4 above.

With respect to claim 15, it is a computer system claim that corresponding to the computer method of claim 7. Therefore, it is rejected for the same reason as claimed in claim 7 above.

With respect to claim 16, it is a computer system claim that corresponding to the computer method of claim 8. Therefore, it is rejected for the same reason as claimed in claim 8 above.

With respect to claim 17, it is a computer system claim that corresponding to the computer method of claim 9. Therefore, it is rejected for the same reason as claimed in claim 9 above.

With respect to claim 18, it is a non-transitory computer-readable medium claim that corresponding to the computer method of claim 1. Therefore, it is rejected for the same reason as claimed in claim 1 above.

With respect to claim 19, it is a non-transitory computer-readable medium claim that corresponding to the computer method of claim 2. Therefore, it is rejected for the same reason as claimed in claim 2 above.

With respect to claim 20, it is a non-transitory computer-readable medium claim that corresponding to the computer method of claim 7. Therefore, it is rejected for the same reason as claimed in claim 7 above.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Baker et al (US 20200410090 A1) hereafter Baker, in view of Li et al (US 20190068171 A1) hereafter Li, further in view of Coelho et al (US 20210051322 A1) hereafter Coelho, further in view of Honda et al (US 12038802 B2) hereafter Honda, as claimed in claim 1 above, and further in view of He et al (US 20220100624 A1) hereafter He.

With respect to claim 6, the combination of Baker, Li, Coelho and Honda does not teach wherein the classification model comprises a softmax classification layer.
In the same field of endeavor, He teaches wherein the classification model comprises a softmax classification layer (the convolutional neural network comprises convolutional layer, pooling layer and softmax layer. The softmax layer is used for multi-label classification of an original output signal [par. 0011, 0046]).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the concept of identifying and estimating complex analog circuit failure that includes a usage of softmax layer as suggested by He into the combination of Baker, Li, Coelho and Honda because all of the systems addressing the process of training neural network classifiers to improve the performance of the kernel filtering framework such as classification and regression. Doing so would be desirable because the combination of Baker, Li, Coelho and Honda would be more efficient by having a multi-label classification such as the softmax layer to divide the training data set into multiple labels or classes randomly to estimate a state of a degraded system (He, [par. 0011, 0015, 0046]).

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Baker et al (US 20200410090 A1) hereafter Baker, in view of Li et al (US 20190068171 A1) hereafter Li, further in view of Coelho et al (US 20210051322 A1) hereafter Coelho, further in view of Honda et al (US 12038802 B2) hereafter Honda, as claimed in claim 10 above, and further in view of He et al (US 20220100624 A1) hereafter He.

With respect to claim 14, the combination of Baker, Li, Coelho and Honda teaches wherein the classification model comprises one or more of: a linear classifier; a logistic regression classifier; a multiple-class classifier (Li, kernel adaptive filtering (KAF) removes the limitations of the linear model to provide a general nonlinear solution. NICE and KLMS are the two among all kernel methods that create unifying framework for classification and regression. NICE-KLMS can also be considered as multiple filtering algorithm [par. 0019-0021]). 
However, the combination of Baker, Li, Coelho and Honda does not mention a softmax classification layer.  
In the same field of endeavor, He teaches a softmax classification layer (the convolutional neural network comprises convolutional layer, pooling layer and softmax layer. The softmax layer is used for multi-label classification of an original output signal [par. 0011, 0046]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the concept of identifying and estimating complex analog circuit failure that includes a usage of softmax layer as suggested by He into the combination of Baker, Li, Coelho and Honda because all of the systems addressing the process of training neural network classifiers to improve the performance of the kernel filtering framework such as classification and regression. Doing so would be desirable because the combination of Baker, Li, Coelho and Honda would be more efficient by having a multi-label classification such as the softmax layer to divide the training data set into multiple labels or classes randomly to estimate a state of a degraded system (He, [par. 0011, 0015, 0046]).


Response to Arguments
	The examiner respectfully acknowledges the applicant’s amendments to claims 1, 10 and 18. 
	Applicant’s arguments filed on 12/29/2025 regarding the rejections to claims 1-20 under 35 USC 103 have been fully considered and moot in view of new ground of rejection (see rejection above).




Conclusion
Applicant’s amendment necessitated the new grounds of rejection presented in this Office Action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 706.07(a). Applicant is remined of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filled within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Quoc Phung whose telephone number is (703) 756 1330. The examiner can normally be reached on Monday through Friday from 9am to 5pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached on 571-272-7212.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Q.L.P./Examiner, Art Unit 2143   
/JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143

Read full office action

Prosecution Timeline

Show 5 earlier events

Jul 11, 2025

Examiner Interview Summary

Jul 11, 2025

Request for Continued Examination

Jul 17, 2025

Response after Non-Final Action

Oct 01, 2025

Non-Final Rejection mailed — §103

Dec 29, 2025

Response Filed

Jan 16, 2026

Applicant Interview (Telephonic)

Jan 21, 2026

Examiner Interview Summary

Apr 23, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/167,842

Patent 12632721

ACTOR ENSEMBLE FOR CONTINUOUS CONTROL

5y 3m to grant Granted May 19, 2026

17/337,002

Patent 12554998

DATA ANALYTICS FOR MORE-INFORMED REPAIR OF A MECHANICAL OR ELECTROMECHANICAL SYSTEM

4y 8m to grant Granted Feb 17, 2026

18/845,007

Patent 12415528

COMPLEX NETWORK COGNITION-BASED FEDERATED REINFORCEMENT LEARNING END-TO-END AUTONOMOUS DRIVING CONTROL SYSTEM, METHOD, AND VEHICULAR DEVICE

1y 0m to grant Granted Sep 16, 2025

17/358,167

Patent 12353983

AN INFERENCE DEVICE AND METHOD FOR REDUCING THE MEMORY USAGE IN A WEIGHT MATRIX

4y 0m to grant Granted Jul 08, 2025

Study what changed to get past this examiner. Based on 4 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

5-6

Expected OA Rounds

46%

Grant Probability

99%

With Interview (+93.3%)

4y 2m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 26 resolved cases by this examiner. Grant probability derived from career allowance rate.