Last updated: May 29, 2026
Application No. 18/109,710
MACHINE LEARNING SYSTEMS AND METHODS FOR CLASSIFICATION

Final Rejection §101§102§103
Filed
Feb 14, 2023
Priority
Oct 12, 2022 — provisional 63/379,239 +1 more
Examiner
ILES, TYLER EDWARD
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
Samsung Display Co., Ltd.
OA Round
2 (Final)
This examiner grants 60% of cases after interview

— +50.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 5 resolved cases, 2023–2026
Examiner Intelligence

ILES, TYLER EDWARD View full profile →
Grants 60% of resolved cases
Career Allowance Rate
3 granted / 5 resolved
+5.0% vs TC avg
Strong +50% interview lift
Without
With
+50.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
9 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
13.2%
-26.8% vs TC avg
§103
84.2%
+44.2% vs TC avg
§102
2.6%
-37.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 5 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is in response to an amendment filed on February 16th, 2026. Claims 1-20 are pending in the current application.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.



	Claim(s) 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, Under Step 1 of the Subject Matter Eligibility Test of Products and Processes, the claim is directed towards a system, which is considered a machine, which is one of the four statutory categories.
Next, under a Step 2A Prong 1 Analysis, the claim mentions
calculate reference Shapley values for features of a data sample based on a first classification model
predict Shapley values for the features of the data sample based on the reference Shapley values and a distillation loss
predict a class label for the data sample based on the predicted Shapley values and a cross-entropy loss in comparison with a ground truth class label for the data sample
As drafted, these are processes that, under the broadest reasonable interpretation, fall under the “mental processes” grouping of abstract ideas, as well as the “mathematical concepts” grouping of abstract ideas.
Therefore, we have to examine the claim under Step 2A prong 2, which considers the additional elements within the claim. The claim’s additional elements are:
one or more processors and memory
a first classification model
train a second classification model though multi-task distillation
The limitations, as drafted, are interpreted to be mere instructions to apply the abstract idea, as it instructs to use processors, memory and trained models to perform the abstract idea. (See MPEP 2106.05(f)) Therefore, these additional elements do not integrate the abstract idea into a practical application. The claim is directed to an abstract idea.
Under a Step 2B analysis, the claim’s addition elements do not amount to significantly
more than the judicial exception as explained above in Step 2A prong 2. Therefore, the claim is ineligible.

Regarding claim 9, Under Step 1 of the Subject Matter Eligibility Test of Products and Processes, the claim is directed towards a method, which is considered a process, which is one of the four statutory categories.
Next, under a Step 2A Prong 1 Analysis, the claim mentions
calculating reference Shapley values for features of a data sample based on a first classification model
predict Shapley values for the features of the data sample based on the reference Shapley values and a distillation loss
predict a class label for the data sample based on the predicted Shapley values and a cross-entropy loss in comparison with a ground truth class label for the data sample.
As drafted, these are processes that, under the broadest reasonable interpretation, fall under the “mental processes” grouping of abstract ideas, as well as the “mathematical concepts” grouping of abstract ideas.
Therefore, we have to examine the claim under Step 2A prong 2, which considers the additional elements within the claim. The claim’s additional elements are:
one or more processors
a first classification model
training a second classification model though multi-task distillation
The limitations, as drafted, are interpreted to be mere instructions to apply the abstract idea, as it instructs to use processors, memory and trained models to perform the abstract idea. (See MPEP 2106.05(f)) Therefore, these additional elements do not integrate the abstract idea into a practical application. The claim is directed to an abstract idea.
Under a Step 2B analysis, the claim’s addition elements do not amount to significantly more than the judicial exception as explained above in Step 2A prong 2. Therefore, the claim is ineligible.

Regarding claim 17, Under Step 1 of the Subject Matter Eligibility Test of Products and Processes, the claim is directed towards a machine, which is one of the four statutory categories.
Next, under a Step 2A Prong 1 Analysis, the claim recites
calculate reference Shapley values based on the predictions by the first machine layer perceptron during the first training
predict Shapley values for features of the first masked data based on the reference Shapley values and a distillation loss
predict a class label for the first data sample based on the predicted Shapley values and a cross-entropy loss in comparison with a ground truth class label for the first data sample.
As drafted, these are processes that, under the broadest reasonable interpretation, fall under the “mental processes” grouping of abstract ideas, as well as the “mathematical concepts” grouping of abstract ideas.
Therefore, we have to examine the claim under Step 2A prong 2, which considers the additional elements within the claim. The claim’s additional elements are:
a first machine layer perceptron to be trained… to output predictions on first masked data of a first data sample during a first training 
a first guideline generator to be trained
a first classification model
a second classification model to be trained… though multi-task distillation during the first training
one or more processors
The limitations, as drafted, are interpreted to be mere instructions to apply the abstract idea, as it instructs to use processors and trained models to perform the abstract idea and generate prediction outputs. (See MPEP 2106.05(f)) Therefore, these additional elements do not integrate the abstract idea into a practical application. The claim is directed to an abstract idea.
Under a Step 2B analysis, the claim’s addition elements do not amount to significantly more than the judicial exception as explained above in Step 2A prong 2. Therefore, the claim is ineligible.


	Regarding claims 2 and 10, the claims recite to “generate masked data by substituting a subset of original feature values in the data sample with background values; and train a multilayer perceptron to output predictions on the masked data based on the first classification model, wherein the reference Shapley values are calculated based on the predictions output by the multilayer perceptron.” The limitations, as drafted, are interpreted to be mere instruction to apply a judicial exception, as it instructs to generate masked data by substituting a subset of original features, and to train a perceptron to output predictions. (See MPEP 2106.05(f)) Therefore, the claims are rejected on the same basis as claims 1 and 9.

	Regarding claims 3 and 11, the claims recite “the first classification model is a decision-tree based model; and the multilayer perceptron is trained on the masked data according to the decision- tree based model and a loss function.” The “first classification model is a decision-tree based model;” merely indicates the field of use and technological environment and “generally links” a decision-tree based model to the judicial exception, (See MPEP 2106.05(h)) and “the multilayer perceptron is trained on the masked data according to the decision- tree based model and a loss function” is interpreted to be mere instructions to apply an exception, as it instruct on how to train the perceptron. (See MPEP 2106.05(f)) Therefore, the claims are rejected on the same basis as claims 2 and 10.

	Regarding claims 4 and 12, the claims recite “estimate the Shapley values for the features of the masked data; and compare the estimated Shapley values with the reference Shapley values according to the distillation loss.” The limitations, as drafted, are interpreted to be, under the broadest reasonable interpretation, “mental processes”, which is a grouping of abstract idea. Therefore, the claims are rejected on the same basis as claims 2 and 10.

	Regarding claims 5 and 13, the claims recite “estimating, by the one or more processors, the class label for the data sample based on the estimated Shapley values; and comparing, by the one or more processors, the estimated class label with the ground truth class label according to a cross-entropy loss.” The limitations, as drafted, are interpreted to be, under the broadest reasonable interpretation, “mental processes”, which is a grouping of abstract idea. Therefore, the claims are rejected on the same basis as claims 1 and 9.
	Regarding claims 6 and 14, the claims recite “the second classification model comprises a plurality of fully connected layers with a linear activation function to be trained to predict the Shapley values based on the reference Shapley values and the distillation loss.” The limitation, as drafted, merely indicates the field of use, or technological environment, and “generally links” fully connected layers with a linear activation function, to the second classification model. (See MPEP 2106.05(h)) Therefore, the claims are rejected on the same basis as claims 1 and 9.

	Regarding claims 7 and 15, the claims recite “the second classification model comprises at least one fully connected layer to be trained to predict the class label based on the predicted Shapley values and a cross-entropy loss.” The limitation, as drafted, merely indicates the field of use, or technological environment, and “generally links” fully connected layers with a linear activation function, to the second classification model. (See MPEP 2106.05(h)) Therefore, the claims are rejected on the same basis as claims 1 and 9.

	Regarding claims 8 and 16, the claims recite “outputting the predicted Shapley values as explanations for the class label prediction.” The limitation, as drafted, is considered to be insignificant extra-solution activity, (See MPEP 2106.05(g)) as well as well-understood, routine, and conventional, as it is considered to be receiving or transmitting data over a network. (See MPEP 2106.05(d)) Therefore, the claims are rejected on the same basis as claims 1 and 9.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim(s) 1, 5-9, and 13-16 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Quan Zheng et al. (Herein referred to as Zheng) (Shap-CAM: Visual Explanations for Convolutional Neural Networks based on Shapley Value)

Regarding claim 1, Zheng teaches a classification system comprising: one or more processors and memory (While Zheng never explicitly mentions a processor and memory, you would implicitly need those components to run the method of Zheng.) calculate reference Shapley values for features of a data sample based on a first classification model (“The Shapley value for player i defined above can be interpreted as the average marginal contribution of player i to all possible coalitions S that can be formed without it.”, pg. 7, second paragraph; Also see Equation 3 on pg. 7) (The Shapley values are calculated for a function as detailed with the equation on page 7, the function being based on data samples which are based on classification) and train a second classification model though multi-task distillation to: predict Shapley values for the features of the data sample based on the reference Shapley values and a distillation loss (“the Shapley Value of the pixel (i,j) represents its marginal contribution to the class confidence. Thus, we define the Shapley Value as the saliency map of our Shap-CAM… We train a WRN-40-2 teacher network (2.2 M parameters) on the CIFAR-10 dataset. In order to train a student WRN-16-2 network (0.7 M parameters), we introduce a modified loss Lstu, which is a weighted combination of the standard cross entropy loss LCE and an interpretability loss”, pg. 8, top paragraph; pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation; See also Equation 9 on pg. 13) (With regards to Equation 9 on pg. 13, the student model acts as our second classification model, which is trained to predict Shapley values (corresponding to Lc) using reference Shapley values (corresponding to the Lc in the teacher model) and a loss (LCE)) and predict a class label for the data sample based on the predicted Shapley values and a cross-entropy loss in comparison with a ground truth class label for the data sample. (“we introduce a modified loss Lstu, which is a weighted combination of the standard cross entropy loss LCE and an interpretability loss Linterp where the first term represents the cross entropy loss and the second term rep resents the interpretability loss Linterp. In the above equations, I indicates the input image and c stands for the corresponding output class label.”, pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation”; See also Equation 9 on pg. 13) (The student model predicts a class label based on the training, using the predicted Shapley values (Lc), and a cross-entropy loss (LCE) in comparison with a ground truth class label. (Which corresponds to the weights.))

Regarding claim 9, Zheng teaches A method for classifying data, comprising: calculating, by one or more processors, reference Shapley values for features of a data sample based on a first classification model; (“For a given player i, its Shapley value can be computed as: [See Equation 3 on page 7] The Shapley value for player i defined above can be interpreted as the average marginal contribution of player i to all possible coalitions S that can be formed without it.”, pg. 7, second paragraph; Also see Equation 3 on pg. 7) (The Shapley values are calculated for a function as detailed with the equation on page 7, the function being based on data samples which are based on classification) and training, by the one or more processors, a second classification model though multi-task distillation to: predict Shapley values for the features of the data sample based on the reference Shapley values and a distillation loss (“the Shapley Value of the pixel (i,j) represents its marginal contribution to the class confidence. Thus, we define the Shapley Value as the saliency map of our Shap-CAM… We train a WRN-40-2 teacher network (2.2 M parameters) on the CIFAR-10 dataset. In order to train a student WRN-16-2 network (0.7 M parameters), we introduce a modified loss Lstu, which is a weighted combination of the standard cross entropy loss LCE and an interpretability loss”, pg. 8, top paragraph; pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation; See also Equation 9 on pg. 13) (With regards to Equation 9 on pg. 13, the student model acts as our second classification model, which is trained to predict Shapley values defined as the saliency map, using reference Shapley values (corresponding to the Lc in the teacher model) and a loss (LCE)) and predict a class label for the data sample based on the predicted Shapley values and a cross-entropy loss in comparison with a ground truth class label for the data sample. (“we introduce a modified loss Lstu, which is a weighted combination of the standard cross entropy loss LCE and an interpretability loss Linterp where the first term represents the cross entropy loss and the second term rep resents the interpretability loss Linterp. In the above equations, I indicates the input image and c stands for the corresponding output class label.”, pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation”; See also Equation 9 on pg. 13) (The student model predicts a class label based on the training, using the predicted Shapley values (Lc), and a cross-entropy loss (LCE) in comparison with a ground truth class label. (Which corresponds to the weights.))

Regarding claim 5 and 13, Zheng teaches the classification system of claim 1, wherein to train the second classification model to predict the class label, the instructions further cause the one or more processors to: estimate the class label for the data sample based on the estimated Shapley values; (In reference to Equation 9 on pg. 13, the student model (second classification model) estimates a class label (c) based on a Shapley value. (Lc)) and compare the estimated class label with the ground truth class label according to a cross-entropy loss. (“In order to train a student WRN-16-2 network (0.7 M parameters), we introduce a modified loss Lstu, which is a weighted combination of the standard cross entropy loss LCE and an interpretability loss Linterp”, pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation”) 

Regarding claim 6 and 14, Zheng teaches the classification system of claim 1, wherein the second classification model comprises a plurality of fully connected layers with a linear activation function to be trained to predict the Shapley values based on the reference Shapley values and the distillation loss. (“The CAM explanation regards the importance of each channel as the weight of fully connected layer connecting the global average pooling and the output probability distribution. However, an obvious limitation of CAM is the requirements of a GAP penultimate layer and retraining of an additional fully connected layer.”, pg. 4, second paragraph; See also Fig. 1 on pg. 2 and Equation 9 on pg. 13)

Regarding claim 7 and 15, Zheng teaches the classification system of claim 1, wherein the second classification model comprises at least one fully connected layer to be trained to predict the class label based on the predicted Shapley values and a cross-entropy loss. (“The CAM explanation regards the importance of each channel as the weight of fully connected layer connecting the global average pooling and the output probability distribution.” pg. 4, second paragraph; See also Fig. 1 on pg. 2 and Equation 9 on pg. 13)

Regarding claim 8 and 16, Zheng teaches the classification system of claim 1, wherein the instructions further cause the one or more processors to output the predicted Shapley values as explanations for the class label prediction. (“I indicates the input image and c stands for the corresponding output class label. Lc is the explanations given by a certain interpreter.”, pg. 13 and 14)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 2-4, 10-12, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng in view of Neil Jethani et al. (Herein referred to as Jethani.) (FASTSHAP: REAL-TIME SHAPLEY VALUE ESTIMATION)

Regarding claims 2 and 10, Zheng teaches the classification system of claim 1, wherein to calculate the reference Shapley values, the instructions further cause the one or more processors to: generate masked data by substituting a subset of original feature values in the data sample with background values (“The original input is masked by pointwise multiplication with the saliency maps to observe the score change on the target class.”, “combine the rectified convolutional feature maps and the gradients via backpropagation to compute the saliency map which represents where the model has to look to make the particular decision”, pg. 11, under “4.3 Faithfulness Evaluation via Image Recognition“; Fig. 1, on pg. 2) (The rectified feature maps and gradients are used to mask the data) 
However, Zheng does not explicitly teach to train a multilayer perceptron to output predictions on the masked data based on the first classification model, wherein the reference Shapley values are calculated based on the predictions output by the multilayer perceptron. 
Jethani teaches to train a multilayer perceptron to output predictions on the masked data based on the first classification model, wherein the reference Shapley values are calculated based on the predictions output by the multilayer perceptron. (“Here, we describe a default value function that is useful for explaining predictions from a classification model… the surrogate model psurr(y | m(x, s); β) takes as input a vector of masked features m(x, s), where the masking function m replaces features xi such that si = 0 with a [mask] value that is not in the support of X... we use the Shapley kernel to put more weight on subsets likely to be encountered when training FastSHAP.”, pg. 4, paragraphs 2, 4 and 6) (The surrogate model comprises fully connected layers as shown in the follow quotation: (“The FastSHAP model φfast(x, y; θ) and the surrogate model psurr(y | m(x, s); β) are implemented using neural networks that consist of 2-3 fully connected layers”, pg. 16, under “D.1 Models”) The model is further trained to output predictions using a Shapley kernel for calculated reference Shapley values. The input to the model is masked data, which can be configured to be based on a first classification model, such as the one in Zhang) 
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the current application' s filing date, to combine the system and models of Zheng, with the tree model of Jethani. One would be motivated to combine the two teachings, prior to the filing date of the current application, as FastSHAP generates accurate explanations with an orders-of-magnitude speedup as disclosed in Jethani. (“we compare FastSHAP to existing estimation approaches and find that it generates accurate explanations with an orders-of-magnitude speedup.”, pg. 1, Abstract) 

Regarding claims 3 and 11, Zheng, teaches the classification system of claim 2, but does not explicitly teach that the first classification model is a decision-tree based model; and the multilayer perceptron is trained on the masked data according to the decision- tree based model and a loss function.
Jethani teaches a first classification model that is a decision-tree based model (“We use either neural networks or tree-based models…”, pg. 5, under “Implementation details”) and the multilayer perceptron is trained on the masked data according to the decision- tree based model and a loss function. (“Separate from the original model f(x; η), the surrogate model psurr(y | m(x, s); β) takes as input a vector of masked features m(x, s), where the masking function m replaces features xi such that si = 0 with a [mask] value that is not in the support of X . Similar to prior work (Frye et al., 2020; Jethani et al., 2021), the parameters β are learned by minimizing the following loss function:”, pg. 4, fourth paragraph)
Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the current application' s filing date, to combine the system of Zheng, with the tree model of Jethani. One would be motivated to combine the two teachings, prior to the filing date of the current application, as deep tree-based models have high representation capacity, they can provide many-to-many mappings, and they can be trained by stochastic gradient descent, as disclosed in Jethani. (“We use either neural networks or tree-based models… deep neural networks are ideal for FastSHAP because they have high representation capacity, they can provide many-to-many mappings, and they can be trained by stochastic gradient descent”, pg. 5, under “Implementation details”) 

Regarding claim 4 and 12, Zheng, as modified by Jethani, teaches the classification system of claim 2, wherein to train the second classification model to predict the Shapley values, the instructions further cause the one or more processors to: estimate the Shapley values for the features of the masked data; (“we use sampling in this paper to estimate the Shapley value and any semi values.”, pg. 8, under “3.3 Estimation of Shapley Value” (Zheng)) and compare the estimated Shapley values with the reference Shapley values according to the distillation loss. (In reference to equation 9 of Zheng on pg. 13, the student’s estimation of Shapley values is compared to the teacher’s Shapley’s values according to a loss function.)

Regarding claim 17, Zheng teaches a classification system comprising: a first guideline generator to be trained to calculate reference Shapley values based on the predictions by the first machine layer perceptron during the first training; (“The Shapley value for player i defined above can be interpreted as the average marginal contribution of player i to all possible coalitions S that can be formed without it.”, pg. 7, second paragraph; Also see Equation 3 on pg. 7) (The Shapley values are calculated as detailed with the equation on page 7, the function being part of a teacher classification model, which acts as our guideline generator.) and a second classification model to be trained though multi-task distillation during the first training to: predict Shapley values for features of the first masked data based on the reference Shapley values and a distillation loss; (“the Shapley Value of the pixel (i,j) represents its marginal contribution to the class confidence. Thus, we define the Shapley Value as the saliency map of our Shap-CAM… We train a WRN-40-2 teacher network (2.2 M parameters) on the CIFAR-10 dataset. In order to train a student WRN-16-2 network (0.7 M parameters), we introduce a modified loss Lstu, which is a weighted combination of the standard cross entropy loss LCE and an interpretability loss”, pg. 8, top paragraph; pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation; See also Equation 9 on pg. 13) (With regards to Equation 9 on pg. 13, the student model acts as our second classification model, which is trained to predict Shapley values defined as the saliency map, using reference Shapley values (corresponding to the Lc in the teacher model) and a loss (LCE)) and predict a class label for the first data sample based on the predicted Shapley values and a ground truth class label for the first data sample. (“In the above equations, I indicates the input image and c stands for the corresponding output class label.”, pg. 13, under “4.5 Learning from Explanations: Knowledge Distillation”; See also Equation 9 on pg. 13) (The student model predicts a class label based on the training, using the predicted Shapley values (Lc), and ground truth class label.)
	However, Zheng does not explicitly teach a first machine layer perceptron to be trained to output predictions on first masked data of a first data sample during a first training based on a first classification model.
	Jethani teaches a first machine layer perceptron to be trained to output predictions on first masked data of a first data sample during a first training based on a first classification model; (“FastSHAP has the flexibility to work with any value function vx,y(s). Here, we describe a default value function that is useful for explaining predictions from a classification model… Separate from the original model f(x; η), the surrogate model psurr(y | m(x, s); β) takes as input a vector of masked features m(x, s), where the masking function m replaces features xi such that si = 0 with a [mask] value that is not in the support of X.”, pg. 4, second and fourth paragraphs) (The first classification model would correspond to the surrogate model, with the original model acting as our perceptron. The models and predictions, Jethani would integrate well into the method of Zheng, giving the guideline generator predictions to train off of.))
	Therefore, it would have been considered obvious to one of ordinary skill in the art, prior to the current application's filing date, to combine the system of Zheng, with the original and surrogate models of Jethani. One could have simply combined the two prior art elements, according to known methods, and the result yield predictable results.

Regarding claim 18, Zheng, as modified by Jethani, teaches the classification system of claim 17, wherein the second classification model comprises: a plurality of fully connected layers with a linear activation function to be trained to predict the Shapley values based on the reference Shapley values and the distillation loss; (“The CAM explanation regards the importance of each channel as the weight of fully connected layer connecting the global average pooling and the output probability distribution. However, an obvious limitation of CAM is the requirements of a GAP penultimate layer and retraining of an additional fully connected layer.”, pg. 4, second paragraph; See also Fig. 1 on pg. 2 and Equation 9 on pg. 13 (Zheng)) and at least one fully connected layer to be trained to predict the class label based on the predicted Shapley values and a cross-entropy loss. (“The CAM explanation regards the importance of each channel as the weight of fully connected layer connecting the global average pooling and the output probability distribution.” pg. 4, second paragraph; See also Fig. 1 on pg. 2 and Equation 9 on pg. 13 (Zheng))


Response to Arguments
Applicant's arguments filed on February 12th, 2026 have been fully considered but they are not persuasive. The applicant’s amendment overcomes the previous 112 rejection and so the rejection has been withdrawn. The applicant argues in substance:

Argument 1: The features of the amended claims 1, 9, and 17 provide an improvement to computer functionality, and as such are not directed to an abstract idea.
The examiner respectfully disagrees. The claims’ features are seemingly directed toward the improvement of an abstract idea, and merely using computer processing components (i.e. one or more processors) and machine learning models to perform, what is interpreted to be under the broadest reasonable interpretation, a grouping of abstract idea, as explained above. 

Argument 2: Claims 1, 9, and 17 are not directed to a mental process, as the human mind would not be able to perform the features of the claims.
The examiner respectfully disagrees. According to MPEP 2106.04(a)(2), a mental process can still be recited if a human could perform the steps with, or without a physical aid, or if the mental process requires a computer. The limitations, “calculate reference Shapley values for features of a data sample based on a first classification model”, “predict Shapley values for the features of the data sample based on the reference Shapley values and a distillation loss”, and “predict a class label for the data sample based on the predicted Shapley values and a cross-entropy loss in comparison with a ground truth class label for the data sample”, are all interpreted to be mental processes preformed with the aid of a generic computer as a tool to perform the mental processes. Therefore, the rejection is maintained.

Argument 3: Zheng does not teach predicting Shapley values for the features of the data sample based on the reference Shapley values and a distillation loss, and Jethani does not remedy its deficiencies.
The examiner respectfully disagrees. On pg. 8 of Zheng, the disclosure explains how the Shapley Value is defined as a saliency map, which corresponds to reference Shapley values. It’s then further explained in “3.3 Estimation of Shapley Value” how a machine learning model is used, with the saliency map as input, to estimate the Shapley values, which is interpreted to be, under the broadest reasonable interpretation, a prediction of a Shapley value. Shap-CAM uses knowledge distillation which utilizes a student-teacher model, with the loss function associated with the model corresponding to a distillation loss. As it is the model outputting the estimation, it would use the loss, as well as the saliency map based on the reference Shapley values, to estimate Shapley values. Therefore, the examiner asserts that Zheng does teach the prediction of Shapley values. and the rejection is maintained.

Conclusion
Claims 19 and 20 are not rejected under prior art, but are rejected under 35 U.S.C. 101.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Tyler E Iles whose telephone number is (571)272-5442. The examiner can normally be reached 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/T.E.I./             Patent Examiner, Art Unit 2122                                                                                                                                                                                           
/KAKALI CHAKI/             Supervisory Patent Examiner, Art Unit 2122
Read full office action
Prosecution Timeline

Feb 14, 2023
Application Filed
Nov 24, 2025
Non-Final Rejection mailed — §101, §102, §103
Feb 12, 2026
Response Filed
Apr 22, 2026
Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/559,396
Patent 12619883
SYSTEMS AND METHODS FOR DETERMINING TIME-SERIES FEATURE IMPORTANCE OF A MODEL
4y 4m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+50.0%)
3y 8m (~4m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 5 resolved cases by this examiner. Grant probability derived from career allowance rate.