Last updated: April 19, 2026
Application No. 18/166,455
SYSTEMS AND METHODS FOR LIGHTWEIGHT MACHINE LEARNING MODELS

Non-Final OA §101§103
Filed
Feb 08, 2023
Examiner
KHAN, SHAHID K
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Capital One Services LLC
OA Round
1 (Non-Final)
Interview Optional

— +15.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 389 resolved cases, 2023–2026
Examiner Intelligence

KHAN, SHAHID K View full profile →
Grants 74% — above average
Career Allow Rate
287 granted / 389 resolved
+18.8% vs TC avg
Strong +16% interview lift
Without
With
+15.7%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
31 currently pending
Career history
420
Total Applications
across all art units
Statute-Specific Performance

§101
10.0%
-30.0% vs TC avg
§103
55.7%
+15.7% vs TC avg
§102
16.5%
-23.5% vs TC avg
§112
15.2%
-24.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 389 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the application filed 2/8/23 in which claims 1-20 were presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  The claim does not fall within at least one of the four categories of patent eligible subject matter because it does not describe the system as including any hardware (e.g., processor, memory) and, therefore, may constitute software per se, which is not a statutory category. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 1, 2, 7, 8, 12, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan (US 11,593,677 B1; published Feb. 28, 2023) and Zacharias, Jan, et al. "Designing a feature selection method based on explainable artificial intelligence." Electronic Markets 32.4 (2022): 2159-2184 (“Zacharias”).
Regarding claim 1, Nagarajan discloses [a] system for using explainability vectors to generate surrogate models, the system comprising:
receiving a first plurality of user profiles for a first plurality of user systems, wherein each user profile in the first plurality of user profiles comprises values for a first set of features; (Nagarajan 2:63-3:10 (“In some embodiments, database 103 can store datasets or records 105 and 107 including data values or features associated with user profiles and activity profiles associated with one or more users and non-person entities. The database 103 can be updated in real-time or near real-time when, for example, an event related to a user, groups or users or non-person entities occurs. In some embodiments the database 103 can be part of, for example, a financial institution system, merchant system, online store system, or other suitable entity capable of designing and producing complex multi-feature assets or multi-feature software objects. One or more components of the asset prediction system 100 can communicate with database 103 via, e.g., the communication bus 101 to retrieve datasets or records 105 and 107 in real-time or near real-time.”), 3:23-33 (“As further examples, datasets or records 105 and 107 can include data values or data points associated with user profiles and user activity profiles associated with one or more users, or non-person entities such as commercial entities, including merchants, industrial entities, firms and businesses, governmental organizations or other suitable non-person entities. Some examples of data included in user profiles can include demographic data such user name, user address, information related with a user social security number, user income, or other suitable user information. Some examples of data included in a user activity profile can include electronic activity associated with a user of group of users, and/or historical data between the one or more users and an entity or entities.”))
training a first machine learning model to determine resource consumption by a user system, wherein the first machine learning model is trained on input including values for the first set of features from each of the first plurality of user profiles and output including corresponding resource consumption values for a user system represented by each of the first plurality of user profiles; (Nagarajan 5:66-6:28 (“The second machine learning model can be an optimization machine learning model. In some instances, there could be competitive objectives between a user and an entity associated with a software object. In such a case, the second machine learning model 205 can harmonize the competitive objectives such that, the user and the entity associated with the software object receive the best possible benefits from assigning the software object to the user. Some examples of such competitive objectives or interests can include maximizing a profit for the entity associated with the asset or software object and minimizing an interest rate associated with the software object, and/or maximizing the probability that a user will accept at least one software object suggested by the asset prediction system 100 and minimizing a risk of monetary loss that may be suffered by the entity associated with the software object. In some embodiments, the second machine learning model 205 can be a trained with historical data collected from user profiles and user activity profiles of multiple users. In some instances, the user profiles and/or activity profiles can be collected from databases owned by organizations, institutions, or other suitable non-person entity that may or may not be associated or have an interest on the software object or user. In some embodiments, the machine learning model 205 can be, for example, an ensemble machine learning model such as a gradient boosting machine, random forest model, bootstrap aggregation model, stacked generalization model, gradient boosted regression tree model, radial basis function network model, or other suitable type of machine learning model.”))
Nagarajan does not expressly disclose:
in response to training the first machine learning model, processing the first machine learning model to extract an explainability vector, wherein each entry in the explainability vector corresponds to a feature in the first set of features and is indicative of a correlation between the feature and the output of the first machine learning model; (but see Zacharias Fig. 3 (“compute SHAP values, sort features”), pg. 2167, 2nd column (“Figure 3 illustrates the artifact’s procedure: First, the user needs to specify the test set size for the XGBoost model training, the number of remaining features k, and whether she conducts test set discrimination.3 Following that, the tool trains an XGBoost model, computes SHAP values and sorts the features according to their global importance scores. Note that our artifact may implement any other ML model besides XGBoost since SHAP is a model-agnostic explainer. Based on the global importance scores, the tool eliminates all features from the dataset except the k highest ranked ones and trains a new XGBoost model based on that reduced dataset.”)) 
based on the explainability vector, rearranging the first set of features to generate a second set of features such that each feature in the second set of features has a correlation with the output of the first machine learning model that is above a correlation threshold; (but see Zacharias pg. 2167 2nd column (“Based on the global importance scores, the tool eliminates all features from the dataset except the k highest ranked ones and trains a new XGBoost model based on that reduced dataset.”), Fig. 3 (“Eliminate all features except the k highest ones”))
processing the values for the first set of features from each profile in the first plurality of user profiles to generate values for the second set of features corresponding to each profile in the first plurality of user profiles; (but see Zacharias pg. 2167, 2nd column (“First, the user needs to specify the test set size for the XGBoost model training, the number of remaining features k, and whether she conducts test set discrimination.”))
training a second machine learning model to determine resource consumption by a user system, wherein the second machine learning model is trained on input including the values for the second set of features corresponding to each profile in the first plurality of user profiles and output including the corresponding resource consumption values for a user system represented by each of the first plurality of user profiles; and (but see Zacharias pg. 2167, 2nd column (“Based on the global importance scores, the tool eliminates all features from the dataset except the k highest ranked ones and trains a new XGBoost model based on that reduced dataset.”))
determining that outputs from the first machine learning model and the second machine learning model differ by less than a prediction threshold (but see Zacharias pg. 2167, 2nd column (“The user can inspect the different SHAP plots of the new XGBoost model as well as performance metrics of the current and the previous models. Figure 4 shows the output of that process.”) (As shown in figure 4 reproduced herein, the output indicates the performance metrics of the full model and the reduced model in terms of AUC, AUPC, Accuracy, True Positive Rate, and True Negative Rate metrics. Based on this comparison, the user can decide whether to continue the iterative process or stop further feature selection.) 

    PNG
    media_image1.png
    738
    774
    media_image1.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan to incorporate the teachings of Zacharias to use Shapley Additive Explanations (SHAP) methods to iteratively select features of the machine learning model taught by Nagarajan, at least because doing would provide local explanations by computing the contribution of each feature to any given prediction. See Zacharias pg. 2161 2nd col. (“In that sense, SHAP provides local explanations by computing the contribution of each feature to any given prediction, the so-called SHAP value. This is best explained considering the following example: we assume an AI system predicting the risk of credit default (between 0 and 1) using several different features. For each of the features of a given borrower, we can compute a SHAP value indicating how that particular feature has driven the prediction.”).

Regarding claim 2, Nagarajan discloses [a] method, the method comprising:
receiving a first machine learning model trained to determine resource consumption by a user system, wherein the first machine learning model is trained on input including values for a first set of features from each of a first plurality of user profiles and output including corresponding resource consumption values for a user system represented by each of the first plurality of user profiles; (Nagarajan 2:63-3:10 (“In some embodiments, database 103 can store datasets or records 105 and 107 including data values or features associated with user profiles and activity profiles associated with one or more users and non-person entities. The database 103 can be updated in real-time or near real-time when, for example, an event related to a user, groups or users or non-person entities occurs. In some embodiments the database 103 can be part of, for example, a financial institution system, merchant system, online store system, or other suitable entity capable of designing and producing complex multi-feature assets or multi-feature software objects. One or more components of the asset prediction system 100 can communicate with database 103 via, e.g., the communication bus 101 to retrieve datasets or records 105 and 107 in real-time or near real-time.”), 3:23-33 (“As further examples, datasets or records 105 and 107 can include data values or data points associated with user profiles and user activity profiles associated with one or more users, or non-person entities such as commercial entities, including merchants, industrial entities, firms and businesses, governmental organizations or other suitable non-person entities. Some examples of data included in user profiles can include demographic data such user name, user address, information related with a user social security number, user income, or other suitable user information. Some examples of data included in a user activity profile can include electronic activity associated with a user of group of users, and/or historical data between the one or more users and an entity or entities.”), (Nagarajan 5:66-6:28 (“The second machine learning model can be an optimization machine learning model. In some instances, there could be competitive objectives between a user and an entity associated with a software object. In such a case, the second machine learning model 205 can harmonize the competitive objectives such that, the user and the entity associated with the software object receive the best possible benefits from assigning the software object to the user. Some examples of such competitive objectives or interests can include maximizing a profit for the entity associated with the asset or software object and minimizing an interest rate associated with the software object, and/or maximizing the probability that a user will accept at least one software object suggested by the asset prediction system 100 and minimizing a risk of monetary loss that may be suffered by the entity associated with the software object. In some embodiments, the second machine learning model 205 can be a trained with historical data collected from user profiles and user activity profiles of multiple users. In some instances, the user profiles and/or activity profiles can be collected from databases owned by organizations, institutions, or other suitable non-person entity that may or may not be associated or have an interest on the software object or user. In some embodiments, the machine learning model 205 can be, for example, an ensemble machine learning model such as a gradient boosting machine, random forest model, bootstrap aggregation model, stacked generalization model, gradient boosted regression tree model, radial basis function network model, or other suitable type of machine learning model.”)).
Nagarajan does not expressly disclose:
processing the first machine learning model to extract an explainability vector, wherein each entry in the explainability vector corresponds to a feature in the first set of features and is indicative of a correlation between the feature and the output of the first machine learning model; (but see Zacharias Fig. 3 (“compute SHAP values, sort features”), pg. 2167, 2nd column (“Figure 3 illustrates the artifact’s procedure: First, the user needs to specify the test set size for the XGBoost model training, the number of remaining features k, and whether she conducts test set discrimination.3 Following that, the tool trains an XGBoost model, computes SHAP values and sorts the features according to their global importance scores. Note that our artifact may implement any other ML model besides XGBoost since SHAP is a model-agnostic explainer. Based on the global importance scores, the tool eliminates all features from the dataset except the k highest ranked ones and trains a new XGBoost model based on that reduced dataset.”))
based on the explainability vector, rearranging the first set of features to generate a second set of features such that each feature in the second set of features has a correlation with the output of the first machine learning model that is above a correlation threshold; (but see Zacharias pg. 2167 2nd column (“Based on the global importance scores, the tool eliminates all features from the dataset except the k highest ranked ones and trains a new XGBoost model based on that reduced dataset.”), Fig. 3 (“Eliminate all features except the k highest ones”))
processing the values for the first set of features from each profile in the first plurality of user profiles to generate values for the second set of features corresponding to each profile in the first plurality of user profiles; and (but see Zacharias pg. 2167, 2nd column (“First, the user needs to specify the test set size for the XGBoost model training, the number of remaining features k, and whether she conducts test set discrimination.”))
training a second machine learning model to determine resource consumption by a user system, wherein the second machine learning model is trained on input including the values for the second set of features corresponding to each profile in the first plurality of user profiles and output including the corresponding resource consumption values for a user system represented by each of the first plurality of user profiles (but see Zacharias pg. 2167, 2nd column (“Based on the global importance scores, the tool eliminates all features from the dataset except the k highest ranked ones and trains a new XGBoost model based on that reduced dataset.”), see Zacharias pg. 2167, 2nd column (“The user can inspect the different SHAP plots of the new XGBoost model as well as performance metrics of the current and the previous models. Figure 4 shows the output of that process.”) (As shown in figure 4 reproduced herein, the output indicates the performance metrics of the full model and the reduced model in terms of AUC, AUPC, Accuracy, True Positive Rate, and True Negative Rate metrics. Based on this comparison, the user can decide whether to continue the iterative process or stop further feature selection.).

    PNG
    media_image1.png
    738
    774
    media_image1.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan to incorporate the teachings of Zacharias to use Shapley Additive Explanations (SHAP) methods to iteratively select features of the machine learning model taught by Nagarajan, at least because doing would provide local explanations by computing the contribution of each feature to any given prediction. See Zacharias pg. 2161 2nd col. (“In that sense, SHAP provides local explanations by computing the contribution of each feature to any given prediction, the so-called SHAP value. This is best explained considering the following example: we assume an AI system predicting the risk of credit default (between 0 and 1) using several different features. For each of the features of a given borrower, we can compute a SHAP value indicating how that particular feature has driven the prediction.”).
Claim 12 is a computer readable medium (CRM) claim corresponding to claim 2 and, therefore, is similarly rejected.

Regarding claim 7, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan further discloses wherein:
the first machine learning model is defined by a set of parameters comprising a matrix of weights for a multivariate regression algorithm; and (Nagarajan 6:15-28 (“In some embodiments, the second machine learning model 205 can be a trained with historical data collected from user profiles and user activity profiles of multiple users. In some instances, the user profiles and/or activity profiles can be collected from databases owned by organizations, institutions, or other suitable non-person entity that may or may not be associated or have an interest on the software object or user. In some embodiments, the machine learning model 205 can be, for example, an ensemble machine learning model such as a gradient boosting machine, random forest model, bootstrap aggregation model, stacked generalization model, gradient boosted regression tree model, radial basis function network model, or other suitable type of machine learning model.”)).
Nagarajan does not expressly disclose the explainability vector is extracted from the set of parameters using the Shapley Additive Explanation method (but see Zacharias pg. 2167 2nd column (“Figure 3 illustrates the artifact’s procedure: First, the user needs to specify the test set size for the XGBoost model training, the number of remaining features k, and whether she conducts test set discrimination.3 Following that, the tool trains an XGBoost model, computes SHAP values and sorts the features according to their global importance scores. Note that our artifact may implement any other ML model besides XGBoost since SHAP is a model-agnostic explainer.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan to incorporate the teachings of Zacharias to use SHAP as the feature selection method, at least because it provides “a local feature attribution method that conveys in-depth information, including holistic feature importances, local feature importances, and interaction effects.” Zacharias pg. 2167 1st column.
Claim 17 is a CRM claim corresponding to claim 7 and, therefore, is similarly rejected.

Regarding claim 8, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan further discloses wherein:
the first machine learning model is defined by a set of parameters comprising a matrix of weights for a supervised classifier algorithm; and (Nagarajan 6:15-28 (“In some embodiments, the second machine learning model 205 can be a trained with historical data collected from user profiles and user activity profiles of multiple users. In some instances, the user profiles and/or activity profiles can be collected from databases owned by organizations, institutions, or other suitable non-person entity that may or may not be associated or have an interest on the software object or user. In some embodiments, the machine learning model 205 can be, for example, an ensemble machine learning model such as a gradient boosting machine, random forest model, bootstrap aggregation model, stacked generalization model, gradient boosted regression tree model, radial basis function network model, or other suitable type of machine learning model.”)).
Nagarajan does not expressly disclose the explainability vector is extracted from the set of parameters using the Local Interpretable Model-agnostic Explanations method (but see Zacharias pg. 2161 2nd column (“Model-agnostic explainability refers to techniques that are applicable to any kind of model (Ribeiro et al., 2016). One example is Local Interpretable Model-Agnostic Explanations (LIME, Ribeiro et al., 2016), which locally approximates a black-box model with an intrinsically interpretable one. In contrast, model-specific explanations are only applicable to specific model types. A well-known example is the embedded feature importance function of tree-based models (Du et al., 2019).”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan to incorporate the teachings of Zacharias to use LIME as the feature selection method, at least because it “locally approximates a black-box model with an intrinsically interpretable one.” Zacharias pg. 2161 2nd column.

Claim 18 is a CRM claim corresponding to claim 8 and, therefore, is similarly rejected.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan and Zacharias as applied to claims 2 and 12 above, and further in view of Peeling (US 12,430,482 B1; published Sep. 30, 2025).
Regarding claim 3, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan and Zacharias do not expressly disclose wherein rearranging the first set of features to generate the second set of features comprises applying feature engineering using a multi-relational decision tree learning algorithm on the first set of features (but see Peeling 9:1-18 (“A data input option can include a selection of a feature for inclusion (or exclusion) as an input into the design. In some instances, the feature can be a portion of the input dataset. In various instances, a feature can be generated from a portion of the input dataset. The disclosed embodiments are not limited to any particular feature engineering method. In some embodiments, a feature can be generated using a dimensionality reduction technique (e.g., independent component analysis, latent semantic analysis, principal component analysis, an autoencoder, or another suitable method). In various embodiments, the features can be generated automatically (e.g., using multi-relational decision tree learning, deep feature synthesis, or other suitable techniques) or at least partially manually (e.g., coded by a data scientist, domain expert, or other user). In various embodiments, the option to select the feature for inclusion or exclusion can arise from the selection of other design options.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified Nagarajan to incorporate the teachings of Peeling to use multi-relational decision tree learning to generate features, at least because the features can be generated automatically.
Claim 13 is a CRM claim corresponding to claim 3 and, therefore, is similarly rejected.

Claims 4, 5, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan and Zacharias as applied to claims 2 and 12 above, and further in view of Xiong, Qingsong, et al. "Machine learning-driven seismic failure mode identification of reinforced concrete shear walls based on PCA feature extraction." Structures. Vol. 44. Elsevier, 2022 (“Xiong”).
Regarding claim 4, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan and Zacharias do not expressly disclose wherein rearranging the first set of features to generate the second set of features comprises: normalizing the explainability vector into a standard-deviation space to produce a processed vector; generating a covariance matrix based on the processed vector; computing a set of eigenvectors for the covariance matrix; selecting a measure of coverage and selecting a subset of eigenvectors from the set of eigenvectors based on the measure of coverage; and determining the second set of features corresponding to the subset of eigenvectors (but see Xiong, Section 4.1 Principal Component Analysis (“Principal Component Analysis (PCA) is a typical strategy in exploratory data analysis, feature extraction and dimension reduction [33]. Given a dataset of multivariate observations, the goal is to reduce dimensionality and increase interpretability of raw data but meanwhile minimize information loss. Thus, calculating principal components (PCs) and utilizing them to present a change of basis on the data, a smaller set of variables with less redundancy can be obtained, which explain observed signals as a linear combination of orthogonal principal components [34]. Fig. 5 shows the basic principle and schematic procedure of PCA. Given a dataset X with n variables in PCA, the first principal component PC1 (i.e., retains the maximum variance) can be calculated as a linear combination:

    PNG
    media_image2.png
    40
    314
    media_image2.png
    Greyscale
 (2)
where w1 corresponds to an eigenvector of the covariance matrix:

    PNG
    media_image3.png
    40
    176
    media_image3.png
    Greyscale
 (3)
and the elements of the eigenvector ω1j are known as loadings.
The score plot (coordinate value on hyper-plane) and loading plots (level of explained variance) are two major indicators used to illustrate outcome of PCA [34]. Specifically, score plot diagrams the scores of the second PC versus that of the main PC which can be utilized to survey information structure and identify bunches, anomalies, and patterns. Loading plot diagrams the coefficients of every variable for the primary PC versus the subsequent one, ranging from − 1 to 1. Loadings near − 1 or 1 demonstrate that the variable emphatically impacts the PC. Besides, by computing eigen-decomposition of covariance matrix of the data matrix, the retaining extent of information can be indicate referring to the cumulative percentage of variance of each PC.”); see also Section 4.2 Feature Extraction (“Fig. 6 shows the eigenvalue and cumulative variance of PCA results. It could be seen that PC1, PC2 and PC3 only accounted for 68.04 % of the variation. To ascertain a good approximation of original dataset, at least the first six PCs needed to be taken into consideration since they accounted of 91.18 % [coverage] together of the variation in the data.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified Nagarajan to incorporate the teachings of Xiong to select a subset of the explainability feature using principal component analysis techniques, at least because doing so would reduce dimensionality and increase interpretability of raw data but meanwhile minimize information loss.
Claim 14 is a CRM claim corresponding to claim 4 and, therefore, is similarly rejected.

Regarding claim 5, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan and Zacharias do not expressly disclose wherein rearranging the first set of features to generate the second set of features comprises: generating a correlation matrix based on the explainability vector; computing a set of eigenvectors for the correlation matrix; determining a threshold value using a distribution of the set of eigenvectors; and using a maximum-likelihood estimator model to extract the second set of features from the correlation matrix, wherein the maximum-likelihood estimator model takes the threshold value as an input (but see Xiong, Qingsong, et al. "Machine learning-driven seismic failure mode identification of reinforced concrete shear walls based on PCA feature extraction." Structures. Vol. 44. Elsevier, 2022, Section 4.1 Principal Component Analysis (“Principal Component Analysis (PCA) is a typical strategy in exploratory data analysis, feature extraction and dimension reduction [33]. Given a dataset of multivariate observations, the goal is to reduce dimensionality and increase interpretability of raw data but meanwhile minimize information loss. Thus, calculating principal components (PCs) and utilizing them to present a change of basis on the data, a smaller set of variables with less redundancy can be obtained, which explain observed signals as a linear combination of orthogonal principal components [34]. Fig. 5 shows the basic principle and schematic procedure of PCA. Given a dataset X with n variables in PCA, the first principal component PC1 (i.e., retains the maximum variance) can be calculated as a linear combination:

    PNG
    media_image2.png
    40
    314
    media_image2.png
    Greyscale
 (2)
where w1 corresponds to an eigenvector of the covariance matrix:

    PNG
    media_image3.png
    40
    176
    media_image3.png
    Greyscale
 (3)
And the elements of the eigenvector ω1j are known as loadings.
The score plot (coordinate value on hyper-plane) and loading plots (level of explained variance) are two major indicators used to illustrate outcome of PCA [34]. Specifically, score plot diagrams the scores of the second PC versus that of the main PC which can be utilized to survey information structure and identify bunches, anomalies, and patterns. Loading plot diagrams the coefficients of every variable for the primary PC versus the subsequent one, ranging from − 1 to 1. Loadings near − 1 or 1 demonstrate that the variable emphatically impacts the PC. Besides, by computing eigen-decomposition of covariance matrix of the data matrix, the retaining extent of information can be indicate referring to the cumulative percentage of variance of each PC.”); see also Section 4.2 Feature Extraction (“Fig. 6 shows the eigenvalue and cumulative variance of PCA results. It could be seen that PC1, PC2 and PC3 only accounted for 68.04 % of the variation. To ascertain a good approximation of original dataset, at least the first six PCs needed to be taken into consideration since they accounted of 91.18 % [coverage] together of the variation in the data.”) [Note: mean-centering, as in claim 4, is unnecessary if performing a principal component analysis on a correlation matrix, as the data are already centered after calculating correlations.]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified Nagarajan to incorporate the teachings of Xiong to select a subset of the explainability feature using principal component analysis techniques, at least because doing so would reduce dimensionality and increase interpretability of raw data but meanwhile minimize information loss.
Claim 15 is a CRM claim corresponding to claim 5 and, therefore, is similarly rejected.

Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan and Zacharias as applied to claims 2 and 12 above, and further in view of Callot (US 12,265, 446 B1; published Apr. 1, 2025).
Regarding claim 6, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan and Zacharias do not expressly disclose wherein rearranging the first set of features to generate the second set of features comprises: training a third machine learning model to determine resource consumption by a user system, wherein the third machine learning model is trained on input including values for a third set of features from each of the first plurality of user profiles and output including corresponding resource consumption values for a user system represented by each of the first plurality of user profiles; generating a covariance matrix representing correlations between a set of parameters defining the first machine learning model and a set of parameters defining the third machine learning model; generating a combined set of parameters based on the set of parameters defining the first machine learning model, the set of parameters defining the third machine learning model, and the covariance matrix; and extracting the explainability vector from the combined set of parameters, wherein each entry in the explainability vector corresponds to a feature describing the first plurality of user profiles (but see Callot 12:58-13:19 (“FIG. 6 illustrates example machine learning based parameter selection experiments which may be conducted to generate anomaly detection plans, according to at least some embodiments. In an instance 601 of multi-factor anomaly detection, at least two time series are taken into consideration: metric time series MTS1 and MTS2. Respective per-metric forecasting models are trained for the two time series and executed at a probabilistic forecasting engine 620, resulting in MTS1 forecast distributions 651 (for various points of time) and MTS2 forecast distributions 652. Covariance matrices 630 may be computed for the values of MTS1 and MTS2 over some time periods, and provided as input to anomaly detection engine 640 along with the MTS1 forecast distributions 651, MTS2 forecast distributions 652, and observed (post-prediction) values of MTS1 and MTS2 in the depicted embodiment. The anomaly detection engine 640 may use a current version of an anomaly detection plan to combine the information provided to it as input, and generate up to three types of anomaly response actions based on the analysis of the input. MTS1 alarms/actions 660 may be initiated if the post-prediction MTS1 values satisfy a single-metric anomaly score threshold. MTS2 alarms/actions 662 may be initiated if the post-prediction MTS2 values satisfy a single-metric anomaly score threshold, and aggregated score based alarms/actions 664 may be initiated if the combination of the anomaly score contributions with respect to MTS1 and MTS2 satisfy a different threshold. Depending on the anomaly detection plan, in some embodiments only alarms/actions triggered by aggregated scores may be initiated.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further modified Nagarajan to incorporate the teachings of Callot to train additional resource/credit usage models for multiple users and combine the parameters of these models using covariance matrix of the parameters, at least because doing so would identify combinations of parameters to be employed in the instances of multi-factor anomaly detection. 
Claim 16 is a CRM claim corresponding to claim 6 and, therefore, is similarly rejected.

Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan and Zacharias as applied to claims 2 and 12 above, and further in view of Lundberg, Scott “An introduction to explainable AI with Shapley values” (Oct. 14, 2022) (“Lundberg”).
Regarding claim 9, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Although Nagarajan and Zacharias teach using Shapely methods to explain XGBoost models, they do not expressly disclose wherein: the first machine learning model is defined by a set of parameters comprising a vector of coefficients for a generalized additive model; and the explainability vector is extracted from the vector of coefficients in the generalized additive model (but see Lundberg “Explaining an additive regression model” (“The reason the partial dependence plots of linear models have such a close connection to SHAP values is because each feature in the model is handled independently of every other feature (the effects are just added together). We can keep this additive nature while relaxing the linear requirement of straight lines. This results in the well-known class of generalized additive models (GAMs). While there are many ways to train these types of models (like setting an XGBoost model to depth-1), we will use InterpretMLs explainable boosting machines that are specifically designed for this.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan further to incorporate the teachings of Lundberg to apply the Shapely technique to extract explainability vector from a GAM model, at least because each feature in the model is handled independently of every other feature.
Claim 19 is a CRM claim corresponding to claim 9 and, therefore, is similarly rejected.

Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan and Zacharias as applied to claims 2 and 12 above, and further in view of Vijaykeerthy (US 2021/0012156 A1; published Jan. 14, 2021).
Regarding claim 10, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan and Zacharias do not expressly disclose wherein:
the first machine learning model is defined by a set of parameters comprising a matrix of weights for a convolutional neural network algorithm; and (but see Vijaykeerthy ¶ 19 (“The active selection module 104 obtains a dataset 102 comprising a set of training examples (e.g., a set of images) that are provided to the model 108 as part of a training process. In the example shown in FIG. 1, the model 108 includes a plurality of layers 110-1, 110-2, 110-3 . . . 110-N (referred to collectively as layers 110). The layers 110 may include for example, an input layer, an output layer, and one or more hidden layers (e.g., convolutional layer(s), pooling layer(s), ReLU (Rectified Linear Unit) layer(s), fully connected layer(s), etc.). Although the description below generally refers to the model as being a neural network model for classifying images, it is to be appreciated that the teachings herein are generally applicable to other machine learning models, such as machine learning models for classification, clustering, regression, ranking, etc.”))
the explainability vector is extracted from the set of parameters using the Gradient Class Activation Mapping method (but see Vijaykeerthy ¶ 20 (“During training, the model explanation engine 112 computes machine explanations based on the parameter configuration of the model 108 at the current iteration of training. The active selection module 104 obtains the explanations and identifies a subset of training examples from the dataset 102 for which to seek additional supervision. Annotation module 106 generates user explanations for the identified training examples. For example, the annotation module 106 may output each of the identified training examples (e.g., the training image) to a user (e.g., via a graphical user interface), and the user may then provide a user explanation by annotating the training example. As an example, the user explanation may include one or more bounding boxes or a segmentation map. The user explanations may then be incorporated in future training iterations when training the model 108 as described in more detail herein.”), ¶ 29 (“In one or more example embodiments, the feedback is incorporated using trainable explanations. As an example, assume explanations are obtained via a Grad-CAM technique, then one way to incorporate the feedback is as follows: [0030] Let f.sub.l, k be the activation map for l.sup.th filter on the k.sup.th, the gradient of the classification loss for each class c with respect to the activation maps, these are then passed to a Global Activation Pooling layer to obtain neuron importance weights w.sub.l, k.sup.c. [0031] w.sup.c is then used as a kernel for 2D Convolutional Layer (2D Conv) over the activation maps, followed by ReLU layer, which helps backpropagate the error in the explanations and in turn refine the explanations from the model.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan to incorporate the teachings of Vijaykeerthy to use gradient class activation maps to extract explanation vectors, at least because doing so would enable refine the explanation from the model.
Claim 20 is a CRM claim corresponding to claim 10 and, therefore, is similarly rejected.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Nagarajan and Zacharias as applied to claims 2 and 12 above, and further in view of Gueret (US 2022/0180225 A1; published Jun. 9, 2022).
Regarding claim 11, Nagarajan, in view of Zacharias, discloses the invention of claim 2 as discussed above. Nagarajan and Zacharias do not expressly disclose wherein:
the first machine learning model is defined by a set of parameters comprising a hyperplane matrix for a support vector machine algorithm; and (but see Gueret ¶ 82 (“As shown by reference number 220, the machine learning system may train a machine learning model using the set of observations and using one or more machine learning algorithms, such as a regression algorithm, a decision tree algorithm, a neural network algorithm, a k-nearest neighbor algorithm, a support vector machine algorithm, or the like. After training, the machine learning system may store the machine learning model as a trained machine learning model 225 to be used to analyze new observations.”))
the explainability vector is extracted from the set of parameters using the counterfactual explanation method (but see Gueret ¶ 2 (“In some implementations, a method includes receiving, by a device, first data associated with a first unit of a group of units, second data associated with a second unit of the group of units, and target data; obtaining, by the device and based on a qualification model, a first counterfactual explanation associated with the first data not satisfying a qualification threshold of the qualification model, and a second counterfactual explanation associated with the second data not satisfying the qualification threshold, wherein the first counterfactual explanation and the second counterfactual explanation are associated with a first feature identified in the first data and the second data; determining, by the device, an impact score associated with the first feature based on the target data, the first counterfactual explanation, and the second counterfactual explanation; determining, by the device, that the impact score does not satisfy an impact threshold; generating, by the device and based on the impact score not satisfying the impact threshold, one or more revised counterfactual explanation constraints of the qualification model; obtaining, by the device and based on the one or more revised counterfactual explanation constraints of the qualification model, a first revised counterfactual explanation and a second revised counterfactual explanation; determining, by the device, a revised impact score based on the target data, the first revised counterfactual explanation, and the second revised counterfactual explanation; determining, by the device, that the revised impact score satisfies the impact threshold; and performing, by the device and based on determining that the revised impact score satisfies the impact threshold, an action associated with the second feature and the group of units.”)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nagarajan to incorporate the teachings of Gueret to employ counterfactual explanations methods to determine an explanation of a support vector model, at least because doing so would provide a local explanation to an output of the model.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Dalli et al. (US 2022/0012591 A1; published Jan. 13, 2022) METHOD FOR DETECTING AND MITIGATING BIAS AND WEAKNESS IN ARTIFICIAL INTELLIGENCE TRAINING DATA AND MODELS
Vale, Daniel, Ali El-Sharif, and Muhammed Ali. "Explainable artificial intelligence (XAI) post-hoc explainability methods: Risks and limitations in non-discrimination law." AI and Ethics 2.4 (2022): 815-826.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached at (571)272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAHID K KHAN/Primary Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Feb 08, 2023
Application Filed
Jan 24, 2026
Non-Final Rejection — §101, §103
Apr 15, 2026
Applicant Interview (Telephonic)
Apr 16, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/807,290
Patent 12591768
DEEP LEARNING ACCELERATION WITH MIXED PRECISION
2y 5m to grant Granted Mar 31, 2026
18/675,206
Patent 12579516
System and Method for Organizing and Designing Comment
2y 5m to grant Granted Mar 17, 2026
18/525,525
Patent 12566813
SYSTEMS AND METHODS FOR RENDERING INTERACTIVE WEB PAGES
2y 5m to grant Granted Mar 03, 2026
18/263,279
Patent 12547298
Display Method and Electronic Device
2y 5m to grant Granted Feb 10, 2026
17/589,370
Patent 12530916
MULTIMODAL MULTITASK MACHINE LEARNING SYSTEM FOR DOCUMENT INTELLIGENCE TASKS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
90%
With Interview (+15.7%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 389 resolved cases by this examiner. Grant probability derived from career allow rate.