Last updated: May 29, 2026
Application No. 17/857,222
METHOD, ELECTRONIC DEVICE, AND COMPUTER PROGRAM PRODUCT FOR DETECTING MODEL DRIFT

Non-Final OA §103
Filed
Jul 05, 2022
Priority
Jun 10, 2022 — CN 202210657881.9
Examiner
WU, NICHOLAS S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
DELL PRODUCTS, L.P.
OA Round
3 (Non-Final)
This examiner grants 51% of cases after interview

— +39.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 43 resolved cases, 2023–2026
Examiner Intelligence

WU, NICHOLAS S View full profile →
Grants 51% of resolved cases
Career Allowance Rate
22 granted / 43 resolved
-3.8% vs TC avg
Strong +40% interview lift
Without
With
+39.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
32 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.8%
-37.2% vs TC avg
§103
94.4%
+54.4% vs TC avg
§112
2.8%
-37.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 43 resolved cases
Office Action

§103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/24/2026 has been entered.
 
Response to Arguments
Applicant's arguments filed 01/26/2026 have been fully considered but they are not fully persuasive.
Regarding the 101 rejections, applicant’s arguments and amendments to the independent claims are persuasive and overcome the previous 101 rejections. Specifically, applicant’s amended limitations now provide a technical improvement because amended claim 1 now incorporates the judicial exceptions into a practical application by providing a technical improvement to image classification, text classification, or data mining in a production system by leveraging Shapely values to determine whether a model is experiencing model drift. See pg. 10-11- of “Remarks”: "However, claim 1 clearly integrates any such abstract idea into a practical application in the field of computer technology, and more particularly in Al and machine learning systems, in a manner that provides an improvement in computer technology. For example, illustrative embodiments of the claimed arrangement provide a particular solution to a particular problem in Al and machine learning systems, namely, the interpretation problem associated with conventional drift detection approaches, as described in the specification of the present application at page 4, lines 3-24: As mentioned above, the performance of a machine learning model can deteriorate over time as an environment changes (e.g., due to changes in user behavior and/or sensor drift). This phenomenon is referred to as model drift. When a model drift occurs, output results of image classification, text classification, or data mining using this model are not accurate enough. Therefore, the model needs to be monitored to detect whether the model has a drift. Traditional model drift detection typically relies on ground truth, original feature space, and a distribution output score. However, such model monitoring is not easy to interpret. Embodiments of the present disclosure provide a solution for detecting a model drift. According to various embodiments of the present disclosure, training data in a training data set is converted into an input vector represented by Shapley values. The decision tree model has been trained for performing at least one of image classification, text classification, or data mining. The training data set is clustered on the basis of such input vector, so as to obtain a plurality of data clusters. In response to receiving a first input, the first input is converted to a first input vector represented by Shapley values. A drift degree of the decision tree model is detected on the basis of the first input vector and the plurality of data clusters. According to embodiments described herein, training data and a new input for a model are converted into forms represented by Shapley values, so that it can be determined whether the relationship among input features of the new input changes in comparison with the relationship among input features of training data that has been used. In this way, the drift degree of a model can be detected, thus avoiding inaccurate or even wrong model predictions using a drifted model, and ensuring that results of image classification, text classification, or data mining are more accurate. In addition, model monitoring is also easier to interpret." Applicant’s proposed amendments and corresponding arguments that the claimed invention provides a technical improvement to the field of machine learning are persuasive. Therefore, the 101 rejections are overcome.
Regarding the 103 rejections, applicant's arguments filed with respect to the prior art rejections have been fully considered but they are moot. Applicant has amended the claims to recite new combinations of limitations. Applicant's arguments are directed at the amendment. Please see below for new grounds of rejection, necessitated by Amendment.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3-4, 7-8, 10-11, 14-15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Bhide, et al., US Pre-Grant Publication 2021/0279607A1 (“Bhide”) in view of Shanbhag Non-Patent Literature “Unified Shapley Framework to Explain Prediction Drift” (“Shanbhag”) and further in view of Mopur, et al., US Pre-Grant Publication 2021/0365478A1 (“Mopur”), Aggarwal, et al., US Pre-Grant Publication 2019/0392250A1 (“Aggarwal”), and Dasu, et al., Non-Patent Literature “An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams” (“Dasu”).
Regarding claim 1 and analogous claims 8 and 15, Bhide discloses:
A method, comprising: training a decision tree model, utilizing a plurality of instances of training data in a training data set, for performing at least one of image classification, text classification, or data mining in a production environment; (Bhide, ⁋31, “the trained model may take data (e.g., one or more of text, images, etc.) as input, and may output one or more labels for the input data, one or more classifications for the input data, etc [utilizing a plurality of instances of training data in a training data set, for performing at least one of image classification, text classification, or data mining]. In another embodiment, after the trained model is verified, the trained model may be implemented by inputting production data into the trained model [in a production environment;].”, and Bhide, ⁋43, “In another embodiment, the decision tree [A method, comprising: training a decision tree model,] may analyze the data associated with the accuracy drift and may output the feature space and specific subset of the input data causing the accuracy drift.”).
deploying the trained decision tree model in the production environment; (Bhide, ⁋31, “In another embodiment, after the trained model is verified, the trained model may be implemented by inputting production data into the trained model [deploying the trained decision tree model in the production environment;].”).
…in a processor-based machine learning system,… (Bhide, abstract, “A computer-implemented method according to one embodiment includes identifying an occurrence of accuracy drift by a trained model […in a processor-based machine learning system,…]”).
…a plurality of dimensions of the input vector indicating a plurality of input features of the decision tree model, (Bhide, ⁋43, “In another embodiment, the decision tree [of the decision tree model,] may analyze the data associated with the accuracy drift and may output the feature space and specific subset of the input data causing the accuracy drift. For example, the feature space of the data may include one or more specific features (e.g., aspects, categories, etc.) of the data […a plurality of dimensions of the input vector indicating a plurality of input features]. In another example, exemplary features may include age, occupation, physical location, temperature, etc.”).
receiving in the production environment a first input for processing by the decision tree model in the processor-based machine learning system; (Bhide, ⁋33, “In another embodiment, trained model may take the production data as input and may produce one or more instances of output (e.g., one or more labels, etc.) based on the input [receiving in the production environment a first input for processing by the decision tree model in the processor-based machine learning system;].”).
While Bhide teaches determining concept drift using decision trees, Bhide does not explicitly teach:
converting…each of the instances of the training data in the training data set into a corresponding instance of an input vector represented by Shapley values,
clustering…on the basis of the input vector of each of the instances of the training data, the training data set, so as to obtain a plurality of data clusters;
in response to receiving the first input…converting the first input into a first input vector represented by Shapley values;
detecting…a drift degree of the…model on the basis of the first input vector and the plurality of data clusters.
and controlling…based on the detected drift degree, utilization of predictions generated by the…model;
wherein the detecting a drift degree of the…model on the basis of the first input vector and the plurality of data clusters comprises: calculating distances between the first input vector and the plurality of data clusters, so as to obtain distance vectors;
normalizing the distance vectors;
calculating an entropy of the first input on the basis of the normalized distance vectors and in response to the entropy being greater than a threshold, determining that the…model has a drift.
Shanbhag teaches:
converting…each of the instances of the training data in the training data set into a corresponding instance of an input vector represented by Shapley values, (Shanbhag, pg. 5 col. 1, “In GroupShapley [into a corresponding instance of an input vector represented by Shapley values,], we explain the drift between the G ◦ F output of the explicand and the baseline. The number of players is equal to the number of groups times the number of features. The number of groups is the number of sub-divisions across rows. If the whole sample is one group, the features are the only players. If we have a row as it’s own group, we end up with number of rows × number of features groups to which we attribute the payout. To be precise, we are attributing the drift score to each group in the explicand, where a group is a cross section consisting of at least one row and at most all rows, and at least one feature or at most n features [converting…each of the instances of the training data in the training data set].”).
in response to receiving the first input…converting the first input into a first input vector represented by Shapley values; (Shanbhag, pg. 1 col. 2, “Here, we adapt the Shapley framework, in the context of machine learning, for the following task: given two data samples [in response to receiving the first input…] of the same shape, and a function D which computes some metric of distributional difference on the predictions made on the given datasets by a model F, attribute the output of D to each point of the target dataset, and to each feature [converting the first input into a first input vector represented by Shapley values;].”).
Bhide and Shanbhag are both in the same field of endeavor (i.e. concept drift). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Bhide and Shanbhag to teach the above limitation(s). The motivation for doing so is that using Shapley values gives context on which features contribute to model drift (cf. Shanbhag, pg. 1 col. 2, “A systematic method is thus needed for studying prediction drift and attributing it to a) the features of the model and b) the individual data points that constitute the distributional samples that are compared” and Shanbhag pg. 2 col. 1, “Establishing an axiomatic framework for calculating and explaining prediction drift using Shapley values and IG”).
While Bhide in view of Shanbhag teaches determining model drift using decision trees and Shapley values, the combination does not explicitly teach:
clustering…on the basis of the input vector of each of the instances of the training data, the training data set, so as to obtain a plurality of data clusters;
detecting…a drift degree of the…model on the basis of the first input vector and the plurality of data clusters.
and controlling…based on the detected drift degree, utilization of predictions generated by the…model;
wherein the detecting a drift degree of the…model on the basis of the first input vector and the plurality of data clusters comprises: calculating distances between the first input vector and the plurality of data clusters, so as to obtain distance vectors;
normalizing the distance vectors;
calculating an entropy of the first input on the basis of the normalized distance vectors and in response to the entropy being greater than a threshold, determining that the…model has a drift.
Mopur teaches:
clustering…on the basis of the input vector of each of the instances of the training data, the training data set, so as to obtain a plurality of data clusters;
(Mopur, ⁋15, “processes the input data to obtain a plurality of representative points [on the basis of the input vector of each of the instances of the training data, the training data set,]. Further, the input data may be reduced based on baseline reference data computed at the cloud server. The plurality of representative points may be clustered to generate a plurality of clusters. Each cluster of the plurality of clusters may comprise at least one representative point from the plurality of representative points [clustering…so as to obtain a plurality of data clusters;].”).
detecting…a drift degree of the…model on the basis of the first input vector and the plurality of data clusters. (Mopur, ⁋23, “Further, data drift [detecting…a drift degree of the…model] may be identified from the input data, based on changes in densities of the plurality of clusters over a predefined period of time [on the basis of the first input vector and the plurality of data clusters.]. The data drift and the outlier cluster may constitute the relevant information.”).
and controlling…based on the detected drift degree, utilization of predictions generated by the…model; (Mopur, ⁋18, “Further, utilization of only the information related to the data drift [and controlling…based on the detected drift degree,] and the outliers enables updating of the data model in a short time period. Thereupon, updated data model and updated baseline reference data may be provided to the edge system i.e. the local computer. The updated data model would deliver accurate [utilization of predictions generated by the…model;] temperature values related to the HVAC system.”).
wherein the detecting a drift degree of the…model on the basis of the first input vector and the plurality of data clusters comprises: calculating distances between the first input vector and the plurality of data clusters, so as to obtain distance vectors; (Mopur, ⁋35, “In one embodiment, a first data clustering technique, for example K-means clustering, may be used for clustering the plurality of representative points. K-means clustering is a centroid-based algorithm used for grouping the plurality of representative points into K clusters. K-means clustering is an iterative clustering technique Where similarity within the plurality of representative points is derived by the closeness of a representative point to a centroid of a cluster [wherein the detecting a drift degree of the…model on the basis of the first input vector and the plurality of data clusters comprises: calculating distances between the first input vector and the plurality of data clusters, so as to obtain distance vectors;].”).
Bhide, in view of Shanbhag, and Mopur are both in the same field of endeavor (i.e. concept drift). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Bhide, in view of Shanbhag, and Mopur to teach the above limitation(s). The motivation for doing so is that clustering data points improves detection of anomalous data points because outlier clusters can be identified (cf. Mopur, see ⁋16).
While the combination of Bhide, Shanbhag, and Mopur teaches the use of distance vectors and detecting model drift using decision trees, the combination does not explicitly teach:
normalizing the distance vectors;
calculating an entropy of the first input on the basis of the normalized distance vectors and in response to the entropy being greater than a threshold, determining that the…model has a drift.
Aggarwal teaches normalizing the distance vectors; (Aggarwal, ⁋43, “The foregoing is a ratio of a sum of the intra-cluster distances to a sum of the inter-cluster distances with a normalization factor [normalizing the distance vectors;] of (n-1).”).
Bhide, in view of Shanbhag and Mopur, and Aggarwal are both in the same field of endeavor (i.e. data clustering). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Bhide, in view of Shanbhag and Mopur, and Aggarwal to teach the above limitation(s). The motivation for doing so is that normalizing distances in clustering improves the analysis of clusters (cf. Aggarwal, see ⁋43).
While the combination of Bhide, Shanbhag, Mopur, and Aggarwal teaches the use of normalized distance vectors and detecting model drift using decision trees, the combination does not explicitly teach:
calculating an entropy of the first input on the basis of the…distance vectors and in response to the entropy being greater than a threshold, determining that the…model has a drift.
Dasu teaches:
calculating an entropy of the first input on the basis of the…distance vectors (Dasu, pg. 3, “The above tests attempt to capture a notion of distance between two distributions. A measure that is one of the most general ways of representing this distance is the relative entropy [calculating an entropy of the first input on the basis of the…distance vectors] from information theory, also known as the Kullback-Leibler (or KL) distance.”).
and in response to that the entropy is greater than a threshold, determining that the…model has a drift. (Dasu, abstract, “We use relative entropy, also called the Kullback-Leibler distance, to measure the difference between two given distributions [determining that the…model has a drift.]. The KL-distance is known to be related to the optimal error in determining whether the two distributions are the same and draws on fundamental results in hypothesis testing. The KL-distance also generalizes traditional distance measures in statistics, and has invariance properties that make it ideally suited for comparing distributions. Our scheme is general; it is nonparametric and requires no assumptions on the underlying distributions. It employs a statistical inference procedure based on the theory of bootstrapping, which allows us to determine whether our measurements are statistically significant; statistically significant KL-distance measurements are interpreted as an entropy greater than a threshold (i.e. and in response to that the entropy is greater than a threshold,).”). 
Bhide, in view of Shanbhag, Mopur, and Aggarwal, and Dasu are both in the same field of endeavor (i.e. data drift). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Bhide, in view of Shanbhag, Mopur, and Aggarwal, and Dasu to teach the above limitation(s). The motivation for doing so is that the use of Kullback-Leibler distance improves drift detection by adding contextual information to data distribution changes (cf. Dasu, pg. 3, “Using the KL-distance allows us not only to measure the distance between distributions, but attribute a meaning to this value. Further, an information-theoretic distance can be defined independent of the inherent dimensionality of the data, and is even independent of the spatial nature of the data, when one invokes the theory of types. Thus, we can isolate the definition of change from the data representation itself, cleanly separating the computational aspects of the problem from the distance estimation itself.”).

Regarding claim 7 and analogous claim 14, Bhide in view of Shanbhag, Mopur, Aggarwal, and Dasu teaches the method according to claim 1. Bhide also teaches the decision tree model as seen in claim 1.
Shanbhag further teaches acquiring, from the training data, input data used for inputting of the decision tree model and output data used for outputting of the decision tree model; and calculating a contribution value of each input feature in the input data to the output data. (Shanbhag, pg. 1 col. 2, “Here, we adapt the Shapley framework, in the context of machine learning, for the following task: given two data samples [acquiring, from the training data, input data used for inputting of the decision tree model] of the same shape, and a function D which computes some metric of distributional difference on the predictions [and output data used for outputting of the decision tree model;] made on the given datasets by a model F, attribute the output of D to each point of the target dataset, and to each feature [and calculating a contribution value of each input feature in the input data to the output data.].”).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Shanbhag with the teachings of Bhide, Mopur, Aggarwal, and Dasu for the same reasons disclosed in claim 1.

Regarding claim 3 and analogous claims 10 and 17, Bhide in view of Shanbhag, Mopur, Aggarwal, and Dasu teaches the method according to claim 1. 
	Mopur further teaches:
calculating a centroid of each of the plurality of data clusters; (Mopur, ⁋35, “K-means clustering is a centroid-based [calculating a centroid of each of the plurality of data clusters;] algorithm used for grouping the plurality of representative points into K clusters.”).
and calculating distances between the first input vector and the centroids of the plurality of data clusters. (Mopur, ⁋35, “K-means clustering is an iterative clustering technique Where similarity within the plurality of representative points is derived by the closeness of a representative point to a centroid of a cluster [and calculating distances between the first input vector and the centroids of the plurality of data clusters.].”).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Mopur with the teachings of Bhide, Shanbhag, Aggarwal, and Dasu for the same reasons disclosed in claim 1.

Regarding claim 4 and analogous claims 11 and 18, Bhide in view of Shanbhag, Mopur, Aggarwal, and Dasu teaches the method according to claim 1.
Aggarwal further teaches calculating a ratio of each distance in the distance vectors to a total distance. (Aggarwal, ⁋43, “Sn=(n-1) Σfor each topic i D ni/Σfor each pair of topics i,j D ni, nj The foregoing is a ratio of a sum of the intra-cluster distances to a sum of the inter-cluster distances [calculating a ratio of each distance in the distance vectors to a total distance.] with a normalization factor of (n-1). A low sum of intra-cluster distances implies compact topics containing very similar meaning words while a high sum of inter-cluster distance implies the topics are well separated and distinct from each other.”).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Aggarwal with the teachings of Bhide, Shanbhag, Mopur, and Dasu for the same reasons disclosed in claim 1.

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bhide, et al., US Pre-Grant Publication 2021/0279607A1 (“Bhide”) in view of Shanbhag Non-Patent Literature “Unified Shapley Framework to Explain Prediction Drift” (“Shanbhag”) and further in view of Mopur, et al., US Pre-Grant Publication 2021/0365478A1 (“Mopur”), Aggarwal, et al., US Pre-Grant Publication 2019/0392250A1 (“Aggarwal”), Dasu, et al., Non-Patent Literature “An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams” (“Dasu”), and Bischof, et al., Non-Patent Literature “MDL Principle for Robust Vector Quantisation” (“Bischof”).
	Regarding claim 5 and analogous claims 12 and 19, Bhide in view of Shanbhag, Mopur, Aggarwal, and Dasu teaches the method according to claim 1.
	Mopur further teaches clustering all training data in the training data set into a plurality of data clusters using a K-means clustering algorithm, (Mopur, ⁋35, “In one embodiment, a first data clustering technique, for example K-means clustering [clustering all training data in the training data set into a plurality of data clusters using a K-means clustering algorithm,], may be used for clustering the plurality of representative points.”).
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Mopur with the teachings of Bhide, Shanbhag, Aggarwal, and Dasu for the same reasons disclosed in claim 1.
While the combination of Bhide, Shanbhag, Mopur, Aggarwal, and Dasu teaches the use of K-means clustering, the combination does not explicitly teach:
wherein the number of the plurality of data clusters is greater than the number of the plurality of input features.
Bischof teaches wherein the number of the plurality of data clusters is greater than the number of the plurality of input features. (Bischof, pg. 61 col. 1, “Our approach differs from these methods; namely, we use the MDL principle as a pruning criterion in an integral scheme and to identify outliers: We start with an overly complex network, and while training the network, we gradually reduce the number of redundant reference vectors; starting with an overly complex network of redundant reference vectors is interpreted as having a number of clusters greater than the number of input features as there are redundant clusters (i.e. wherein the number of the plurality of data clusters is greater than the number of the plurality of input features.), arriving at a network which balances the error versus the number of reference vectors”). 
Bhide, in view of Shanbhag, Mopur, Aggarwal, and Dasu, and Bischof are both in the same field of endeavor (i.e. data clustering). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Bhide, in view of Shanbhag, Mopur, Aggarwal, and Dasu, and Bischof to teach the above limitation(s). The motivation for doing so is that starting with a large number of clusters and pruning down the clusters improves the search for the optimal cluster amount (cf. Bischof, pg. 61 col. 1, “In this paper, we address the problem of finding the optimal number of reference vectors for vector quantisation (also in the presence of outliers) from the point of view of the Minimum Description Length (MDL) principle…Our approach differs from these methods; namely, we use the MDL principle as a pruning criterion in an integral scheme and to identify outliers: We start with an overly complex network, and while training the network, we gradually reduce the number of redundant reference vectors, arriving at a network which balances the error versus the number of reference vectors. By combining the reduction of complexity and the training phase, we achieve a computationally efficient procedure”).

Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Bhide, et al., US Pre-Grant Publication 2021/0279607A1 (“Bhide”) in view of Shanbhag Non-Patent Literature “Unified Shapley Framework to Explain Prediction Drift” (“Shanbhag”) and further in view of Mopur, et al., US Pre-Grant Publication 2021/0365478A1 (“Mopur”), Aggarwal, et al., US Pre-Grant Publication 2019/0392250A1 (“Aggarwal”), Dasu, et al., Non-Patent Literature “An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams” (“Dasu”), and Gaurav, Non-Patent Literature “An Introduction to Gradient Boosting Decision Trees” (“Gaurav”).
	Regarding claim 6 and analogous claims 13 and 20, Bhide in view of Shanbhag, Mopur, Aggarwal, and Dasu teaches the method according to claim 1.
While the combination of Bhide, Shanbhag, Mopur, Aggarwal, and Dasu teaches the use of decision trees, the combination does not explicitly teach:
wherein the decision tree model is a Gradient Boosted Decision Tree (GBDT) model.
Gaurav teaches wherein the decision tree model is a Gradient Boosted Decision Tree (GBDT) model. (Gaurav, pg. 8, “In gradient boosting decision trees, we combine many weak learners to come up with one strong learner. The weak learners here are the individual decision trees. All the trees are connected in series and each tree tries to minimize the error of the previous tree.”).
Bhide, in view of Shanbhag, Mopur, Aggarwal, and Dasu, and Gaurav are both in the same field of endeavor (i.e. decision trees). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Bhide, in view of Shanbhag, Mopur, Aggarwal, and Dasu, and Gaurav to teach the above limitation(s). The motivation for doing so is that using a GBDT model improves decision trees by boosting weak learners (cf. Gaurav, pg. 6, “Boosting works on the principle of improving mistakes of the previous learner through the next learner. In boosting, weak learner are used which perform only slightly better than a random chance. Boosting focuses on sequentially adding up these weak learners and filtering out the observations that a learner gets correct at every step.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Romanowsky, et al., US20210390457A1 discloses using Shapley values for machine learning model interpretation. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS S WU whose telephone number is (571)270-0939. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/N.S.W./Examiner, Art Unit 2148                                                                                                                                                                                                        /MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jul 05, 2022
Application Filed
Jun 10, 2025
Non-Final Rejection mailed — §103
Sep 10, 2025
Response Filed
Nov 24, 2025
Final Rejection mailed — §103
Jan 26, 2026
Response after Non-Final Action
Feb 24, 2026
Request for Continued Examination
Mar 08, 2026
Response after Non-Final Action
Mar 27, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/231,514
Patent 12619880
METHODS, DEVICES AND MEDIA FOR RE-WEIGHTING TO IMPROVE KNOWLEDGE DISTILLATION
5y 0m to grant Granted May 05, 2026
18/882,311
Patent 12488244
APPARATUS AND METHOD FOR DATA GENERATION FOR USER ENGAGEMENT
1y 2m to grant Granted Dec 02, 2025
17/444,687
Patent 12423576
METHOD AND APPARATUS FOR UPDATING PARAMETER OF MULTI-TASK MODEL, AND STORAGE MEDIUM
4y 1m to grant Granted Sep 23, 2025
17/265,476
Patent 12361280
METHOD AND DEVICE FOR TRAINING A MACHINE LEARNING ROUTINE FOR CONTROLLING A TECHNICAL SYSTEM
4y 5m to grant Granted Jul 15, 2025
17/191,518
Patent 12354017
ALIGNING KNOWLEDGE GRAPHS USING SUBGRAPH TYPING
4y 4m to grant Granted Jul 08, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
51%
Grant Probability
91%
With Interview (+39.5%)
3y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 43 resolved cases by this examiner. Grant probability derived from career allowance rate.