Last updated: April 19, 2026
Application No. 18/315,476
SYSTEMS AND METHODS FOR USING HASH TABLES FOR GENERATING TEXTUAL PREDICTION EXPLANATIONS

Non-Final OA §101§103§112
Filed
May 10, 2023
Examiner
CAMPOS, ALFREDO
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Capital One Services LLC
OA Round
1 (Non-Final)
Interview Optional

— +33.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 6 resolved cases, 2023–2026
Examiner Intelligence

CAMPOS, ALFREDO View full profile →
Grants 83% — above average
Career Allow Rate
5 granted / 6 resolved
+28.3% vs TC avg
Strong +33% interview lift
Without
With
+33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
26 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
33.3%
-6.7% vs TC avg
§103
42.8%
+2.8% vs TC avg
§102
3.9%
-36.1% vs TC avg
§112
20.0%
-20.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 6 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Objections
Claim 1, 3, 4, 13, and 14 objected to because of the following informalities: 
Claim 1 recites the limitations “a hash table” in claim 1 line 1 and line 19 and “a textual prediction explanation” in line 1 and line 25. The second recitations of the limitations in line 19 should be “the hash table” and line 25 “the textual prediction”.
Claim 2 recites the limitation “a textual prediction explanation” in line 1 and line 29. The second recitations of the limitation in line 19 should be “the textual prediction”.
Claim 3 and analogous 13 states the limitation “a hash table” in line 1 and in claim 2 line 14 states a hast table. Claim 3 depends on claim 2 the second limitation in line 12 should be “the hash table”. 
Claim 4 and analogous 14 states the limitation “a hash table” in line 7 and in claim 3 line 1 states a hast table. Claim 4 depends on claims 2 and 3 the second limitation in line 7 should be “the hash table”. 
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 4 and analogous 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “highest” in claim 4 is a relative term which renders the claim indefinite. The term “highest” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The limitation “generating a hash table such that each hash value corresponds to a category with a highest correlation between the hash value and the category” generate a table with hash values that correspond with the highest correlation between the hash value and category does not allow one in the art to determine the level of correlation.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. The claim(s) recite(s) significantly more. The subject matter eligibility test for products and process is describe below for claim 1 in view of dependent claims.
Regarding claim 1:
Step 1: Is the claim to a process machine manufacture or composition of matter? 
No – Claim 1 recites a system, however the claim fails to have the hardware necessary to execute the instructions.
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four categories of patent eligible subject matter because it does not describe the system as including any hardware such as processor and memory, therefore, may constitute software per se, which is not a statutory category.
Examiner Note: The Alice/Mayo test is given to provide compact prosecution. 
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes – The claim recites the following:
“using values for the first set of features from the first plurality of user profiles and the corresponding plurality of resource availability values, training a first machine learning model to determine resource availability for a user system,” - The limitations recites a mental process of determining resource availability for user system (see MPEP 2106.04(a)(2)III).
“using the explainability vector, selecting from the first set of features a subset of features having corresponding values in the explainability vector above a threshold;”- The limitation recites a mental process of selecting the first set of features a subset of features  above a threshold (see MPEP 2106.04(a)(2)III).
“generating a set of categories based on the subset of features, wherein each category in the set of categories corresponds to one or more textual prediction explanations for the output of the first machine learning model;” - The limitation recites a mental process of generating categories of the subset of features (see MPEP 2106.04(a)(2)III).
“generating a hash table including the set of categories, wherein the hash table is indexable using a hash value generated based on values for the subset of features for a user system;” - The limitation recites a mathematical process of generating a hash table with hash values (see MPEP 2106.04(a)(2)III).
Step 2 Prong 2: Does the claim recite additional elements that integrate the judicial exception into a particular application? No –
The claim includes the additional element(s):
“ A system for using a hash table for generating a textual prediction explanation for an executed instruction, the system comprising: receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values, wherein each user profile includes values for a first set of features;”
The additional elements fall under “apply it” as using a generic computer to use a hash table to generate textual prediction explanations. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
The additional elements fall under Insignificant Extra-Solution Activity as mere data gathering by receiving data from user systems. See MPEP 2106.5(g).
“using values for the first set of features from the first plurality of user profiles and the corresponding plurality of resource availability values, training a first machine learning model to determine resource availability for a user system,”
The additional elements fall under “apply it” as using a generic computer to train a first machine learning model to determine resource availability for a user system. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
“processing the first machine learning model to extract an explainability vector, wherein each entry in the explainability vector corresponds to a feature in the first set of features and is indicative of a correlation between the feature and the output of the first machine learning model;”
The additional elements fall under “apply it” as using a generic computer to extract the explainability vector (see MPEP 2106.05(f)).
“for a user profile processed using the first machine learning model to generate a corresponding resource availability value, transmitting to a user system corresponding to the user profile a notification comprising a textual prediction explanation retrieved from the hash table using a hash value generated based on values of the subset of features from the user profile.”
The additional elements fall under “apply it” as using a generic computer to use a machine learning model to generate resource availability value and transmit the a notification to a user that has the textual prediction using the hash table (see MPEP 2106.05(f)).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No - The claim does not include additional elements that are sufficient to amount to a significantly more than the judicial exemption. As an order whole, the claim is directed to collecting information to generate an explanation to a user. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of generating, receiving and transmitting fall under using generic computer to apply an exemption and mere data gathering. The method does not improve on the function of a computer, transforms an article into another article, nor is it applied by a particular machine, making the claim not patent eligible.
Regarding claim 2:
Step 1: Is the claim to a process machine manufacture or composition of matter? 
Yes – Claim 1 recites a method, and a method falls within one of the of the four categories eligible subject matter.
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes – The claim recites the following:
“using the explainability vector, selecting from the first set of features a subset of features having corresponding values in the explainability vector above a threshold;”- The limitation recites a mental process of selecting the first set of features a subset of features  above a threshold (see MPEP 2106.04(a)(2)III).
“generating a set of categories based on the subset of features, wherein each category in the set of categories corresponds to one or more textual prediction explanations for the output of the first machine learning model;” - The limitation recites a mental process of generating categories of the subset of features (see MPEP 2106.04(a)(2)III).
“generating a hash table including the set of categories, wherein the hash table is indexable using a hash value generated based on values for the subset of features for a user system;” - The limitation recites a mathematical process of generating a hash table with hash values (see MPEP 2106.04(a)(2)III).
Step 2 Prong 2: Does the claim recite additional elements that integrate the judicial exception into a particular application? No –
The claim includes the additional element(s):
“A method for generating a textual prediction explanation for an executed instruction, the method comprising: receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values, wherein each user profile includes values for a first set of features;”
The additional elements fall under “apply it” as using a generic computer to generate textual prediction explanations. See Mere Instructions to Apply an Exemption (see MPEP 2106.05(f)).
The additional elements fall under Insignificant Extra-Solution Activity as mere data gathering by receiving data from user systems. See MPEP 2106.5(g).
“processing a first machine learning model to extract an explainability vector, wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value;”
The additional elements fall under “apply it” as using a generic computer to extract the explainability vector (see MPEP 2106.05(f)).
“and for a user profile processed using the first machine learning model to generate a corresponding resource availability value, transmitting to a user system corresponding to the user profile a notification comprising a textual prediction explanation retrieved from the hash table using a hash value generated based on values of the subset of features from the user profile.”
The additional elements fall under “apply it” as using a generic computer to use a machine learning model to generate resource availability value and transmit the a notification to a user that has the textual prediction using the hash table (see MPEP 2106.05(f)).
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
No - The claim does not include additional elements that are sufficient to amount to a significantly more than the judicial exemption. As an order whole, the claim is directed to collecting information to generate an explanation to a user. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of generating, receiving and transmitting fall under using generic computer to apply an exemption and mere data gathering. The method does not improve on the function of a computer, transforms an article into another article, nor is it applied by a particular machine, making the claim not patent eligible.
Regarding claim 5:
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes – The claim recites the following:
“selecting an attribution technique based on the first set of parameters and the first plurality of user profiles; and”- The limitation recites a mental process of selecting an attribution technique based on the first set of parameters (see MPEP 2106.04(a)(2)III).
Step 2A Prong 2, Step 2B: The additional element(s): 
“The method of claim 2, wherein processing the first machine learning model to extract the explainability vector comprises: retrieving a first set of parameters for the first machine learning model;”
The additional elements fall under Insignificant Extra-Solution Activity as mere data gathering by retrieving data from user systems. See MPEP 2106.5(g).
“applying the attribution technique to the first set of parameters to generate the explainability vector corresponding to the first set of features.”
“The additional elements fall under “apply it” as using a generic computer to apply the attribution technique to the first set of parameters to generate the explainability vector (see MPEP 2106.05(f)).”
Regarding claim 6:
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes – The claim recites the following:
“calculating a threshold for removing features of the explainability vector;”- The limitation recites a mathematical process of calculating  a threshold for removing a feature (see MPEP 2106.04(a)(2)I).
“applying a mathematical transformation to the explainability vector such that values corresponding to the one or more features are adjusted” - The limitation recites a mathematical process of apply a mathematical transformation (see MPEP 2106.04(a)(2)I).
Step 2A Prong 2, Step 2B: The additional element(s): 
“The method of claim 5, wherein selecting a subset of features from the first set of features further comprises: receiving a user request specifying that one or more features be removed from consideration or that impact of the one or more features be reduced;”
“The additional elements fall under “apply it” as using a generic computer to receive a user request to remove a feature form consideration  or reducing the impact of a features (see MPEP 2106.05(f)).”
Regarding claim 7:
Step 2A Prong 2, Step 2B: The additional element(s): 
“The method of claim 5, wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a multivariate regression algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a Shapley Additive Explanation method.”
The additional elements fall under Insignificant Extra-Solution Activity as making the attribution method a Shapley Additive Explanation method. See MPEP 2106.5(g).
Regarding claim 8:
Step 2A Prong 2, Step 2B: The additional element(s): 
“The method of claim 5, wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a supervised classifier algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a Local Interpretable Model-agnostic Explanations method.”
The additional elements fall under Insignificant Extra-Solution Activity as making the attribution method a Local Interpretable Model-agnostic Explanations method. See MPEP 2106.5(g).
Regarding claim 9:
Step 2A Prong 2, Step 2B: The additional element(s): 
“The method of claim 5, wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a convolutional neural network algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a Gradient Class Activation Mapping method.”
The additional elements fall under Insignificant Extra-Solution Activity as making the attribution method a Gradient Class Activation Mapping method. See MPEP 2106.5(g).
Regarding claim 10:
Step 2A Prong 2, Step 2B: The additional element(s): 
“The method of claim 5, wherein: the first machine learning model is defined by a set of parameters comprising a hyperplane matrix for a support vector machine algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a counterfactual explanation method.”
The additional elements fall under Insignificant Extra-Solution Activity as making the attribution method a counterfactual explanation method. See MPEP 2106.5(g).
Claims 11 and 15-20 recite a computer readable medium product and are analogous to the method of claims 2 and 5-10. Therefore, the rejections of claim 2 and 5-10 above applies to claims 11 and 15-20.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill et al. (US20190378210A1) (“Merrill”) in view of Nagarajan et al. (US11593677B1) “Nagarajan”.
Regarding claim 1, Merrill teaches a system for using a hash table for generating a textual prediction explanation for an executed instruction, the system comprising (Merrill para 0079, In some embodiments, S270 includes generating explanation information for a score of a single test data point relative to the reference population by using at least one decomposition generated by the method 200, as described herein ( e.g., Adverse Action information, as described herein) (S271).
Para 0177, A Unified Approach for Differentiable Models
Para 0178, In some embodiments, to decompose predictions for differentiable models, once the test and reference data points are defined within a dataset, which may include single or multiple test data points and single or multiple reference data points, as enumerated earlier, decomposition techniques that support such models are used.
Para 0181 line 1-14, S271 can include: the model evaluation system generating model score explanation information for the evaluation input data set (test data point selected at S220) based on at least one decomposition generated by the method 200 (e.g, at S230, S240, S250). In some embodiments, the model evaluation system generating model score explanation information for the evaluation input data set based on the decomposition includes: generating a lookup key by using the decomposition for the ensemble model score, and performing a data lookup by using the lookup key to retrieve the model score explanation information for the evaluation input data set. In some embodiments, the lookup is a database lookup. In some embodiments, the lookup is a hash table lookup [A system for using a hash table].
Para 0199, S271 can include functions generating explanation information for a test data point ( e.g., a test data point representing a credit applicant). The explanation information can include score explanation information that provides information that can be used to explain how a score was generated for the test data point… S271 can include using the decomposition of test data point relative to the reference population ( or observation that represents the reference population) to generate explanation information that explains how the model generated the score for the test data point. This explanation information can be used to generate an Adverse Action notice, as described herein. S271 can include providing the generated model score explanation information for the test data point to an external system ( e.g., an operator device, a user device, an applicant device, a modeling system, and the like [generating a textual prediction explanation for an executed instruction])):
processing the first machine learning model to extract an explainability vector (Merrill para 0030, In some embodiments, the model evaluation and explanation system (e.g., 120 of FIGS. lA and 1B) uses a non-differentiable model decomposition module (e.g., 121) to decompose scores generated by a model by computing at least one SHAP (SHapley Additive exPlanation) value [processing the first machine learning model]. In some embodiments, decomposing scores includes: for each feature of a test data point, generating a difference value, the difference value for the test data point relative to a corresponding reference data point, the difference value being the decomposition value for the feature. In some embodiments, generating a difference value for a feature includes: computing a SHAP value (as described herein) of the nondifferentiable model for the test data point and computing a SHAP value of the non-differentiable model for the corresponding reference data point, and subtracting the SHAP value for the reference data point from the SHAP value for the test data point to produce the difference value for the feature. In some embodiments, the score decomposition functions as explanation information for the model that explains the score for the test data point (generated by the model, or ensemble) in terms of the score for the reference data point (also generated by the model, or ensemble). In some embodiments, the decomposition is used to generate explanation information for the model. In some embodiments, these decompositions are generated for plural pairs of test data points and corresponding reference data points, and the decompositions are used to explain the model. In this manner, SHAP attributions for a single test data point are transformed into reference-based attributions to the test data point in terms of the reference data point
Para 0086, In some embodiments, each decomposition generated by the method 200 (e.g., at S230, S240, S250) is a vector of decomposition values d, for each feature used by the respective model. In some embodiments, each nondifferentiable decomposition value (generated at S230) is a difference between a SHAP (SHapley Additive exPlanation) value for a test data point ( e.g., a test data point representing a credit applicant, a test data point at a first point in time, etc.) of the test population and a SHAP value for a reference data point of the reference population (e.g., a reference data point representing an accepted credit applicant, a reference data point at a second point in time). [to extract an explainability vector].),
wherein each entry in the explainability vector corresponds to a feature in the first set of features and is indicative of a correlation between the feature and the output of the first machine learning model (Merrill para 0042, In some embodiments, the non-differentiable model decomposition module estimates E[f(x)lxs] by executing machine-executable instructions that implement the procedure (Procedure 1) shown in FIG. 6, wherein vis a vector of node value, which takes the value internal for internal nodes; the vectors a and b represent the left and right node indexe for each internal node; the vector t contains thresholds for each internal node, and dis a vector of indexes of the features used for splitting in internal nodes; the vector r represent the cover of each node (e.g., how many data samples fall in that sub-tree); the weight w measures what proportion of the training samples matching the conditioning set S fall into each leaf.
In some embodiments, each decomposition generated by the method 200 is a vector of decomposition values d, for each feature [wherein each entry in the explainability vector corresponds to a feature in the first set of features].
para 131, In some embodiments, the model evaluation system uses a decomposition generated for a model score ( e.g., at one or more of S230, S240, S250) to generate feature importance information and provide the generated feature importance information to the operator device 171. Feature importance is the application wherein a feature's importance is quantified with respect to a model. A feature may have significant or marginal impact, and it could hurt or harm how a model will score. Features may be colinear and interact and any feature importance application takes into account interactions and colinearities. The present disclosure describes such a method [and is indicative of a correlation between the feature and the output of the first machine learning model;]);
[generating a set of categories based on the subset of features,] wherein each category in the set of categories corresponds to one or more textual prediction explanations for the output of the first machine learning model (Merrill para 0080, In some embodiments, S270 includes generating explanation information for a plurality of test data points (represented of a test population, e.g., a protected class) relative to the reference population (e.g., fairness information, Disparate Impact information, as described herein) by using at least one decomposition generated by the method 200, as described herein (S272) [wherein each category in the set of categories].
Para 272, S272 can include identifying features having decomposition values (in the generated decompositions) above a threshold. In some embodiments, the method 200 includes providing the identified features to an operator device (e.g., 171) via a network. In some embodiments, the method 200 includes displaying the identified features on a display device of an operator device (e.g., 171). In other embodiments, the method 20 includes displaying natural language explanations generated based on the decomposition described above [corresponds to one or more textual prediction explanations for the output of the first machine learning model;].);
generating a hash table including the set of categories, wherein the hash table is indexable using a hash value generated based on values for the subset of features for a user system (Merrill Para 00124, In some embodiments, S230 includes: for each input data set (reference data point) of the reference population, using the non-differentiable model decomposition module to, for each input data set (reference data point) of the reference population, generate a decomposition of the evaluation input data set (x) (test data point) relative to the reference data point; … In some embodiments features with categorical values are encoded as numerics using a suitable method such as one-hot encoding or another mapping specified by the modeler.
Para 0181, S271 can include: the model evaluation system generating model score explanation information for the evaluation input data set (test data point selected at S220) based on at least one decomposition generated by the method 200 (e.g, at S230, S240, S250). In some embodiments, the model evaluation system generating model score explanation information for the evaluation input data set based on the decomposition includes: generating a lookup key by using the decomposition for the ensemble model score, and performing a data lookup by using the lookup key to retrieve the model score explanation information for the evaluation input data set. In some embodiments, the lookup is a database lookup. In some embodiments, the lookup is a hash table lookup [generating a hash table including the set of categories].
Para 0199 line 20-30, S271 can include using the decomposition of test data point relative to the reference population ( or observation that represents the reference population) to generate explanation information that explains how the model generated the score for the test data point. This explanation information can be used to generate an Adverse Action notice, as described herein. S271 can include providing the generated model score explanation information for the test data point to an external system ( e.g., an operator device, a user device, an applicant device, a modeling system, and the like) (Examiner Note: Hash tables are indexable by hash values as such at S271.)); 
transmitting to a user system corresponding to the user profile a notification comprising a textual prediction explanation retrieved from the hash table using a hash value generated based on values of the subset of features from the user profile (Merrill para 0132, In some embodiments, the model evaluation system uses a decomposition generated for a model score to generate adverse action information (e.g., at S271) (as described herein) and provide the generated adverse action information to the operator device 171 [transmitting to a user system corresponding to the user profile a notification comprising a textual prediction explanation]. 
Para 0181, S271 can include: the model evaluation system generating model score explanation information for the evaluation input data set (test data point selected at S220) based on at least one decomposition generated by the method 200 (e.g, at S230, S240, S250). In some embodiments, the model evaluation system generating model score explanation information for the evaluation input data set based on the decomposition includes: generating a lookup key by using the decomposition for the ensemble model score, and performing a data lookup by using the lookup key to retrieve the model score explanation information for the evaluation input data set [retrieved from the hash table using a hash value generated based on values of the subset of features from the user profile].
Para 198 Adverse Action
Para 0199, S271 can include functions generating explanation information for a test data point ( e.g., a test data point representing a credit applicant). The explanation information can include score explanation information that provides information that can be used to explain how a score was generated for the test data point).
	Merrill teaches does not exility teach receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values, wherein each user profile includes values for a first set of features; 
using values for the first set of features from the first plurality of user profiles and the corresponding plurality of resource availability values, training a first machine learning model to determine resource availability for a user system, wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value;
generating a set of categories based on the subset of features, [wherein each category in the set of categories corresponds to one or more textual prediction explanations for the output of the first machine learning model]
However Nagarajan teaches receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values, wherein each user profile includes values for a first set of features (Nagarajan Col 7 line 5-12, FIG. 3 illustrates a swim lane diagram with examples of communications between a client computing device, an asset prediction system, and a user activity profile server, in accordance with one or more embodiments of the present disclosure. In some instances, a user operating client computing device 303 can send a request with personal identifiable or demographic data 307 of the user to the asset prediction system 100.
Col 8 Some examples of the user data that can be inputted to the asset prediction system 100 can include first name 501, last name 503, street address 505, zip code 507, city 509, state 511, identifiable information of a user social security number, for example, the last four digits of the user social security number 513, user total annual income 515, and user nontaxable income 517. In some instances, after a user has entered the information shown on graphical user interface 500, the user can press the button 519 to view preapproved and optimized assets or software objects. It is noted, that the asset prediction system 100 can produce the pre-approved and optimized asset or software object in real-time or near real-time.
Col 9 line 34-45, FIG. 7 depicts a block diagram of an example of the computer-based system 700, in accordance with one or more embodiments of the present disclosure. However, not all these components may be required to practice one or more embodiments, and variations in the arrangement and type of the components may be made without departing from the spirit or scope of various embodiments of the present disclosure. In some embodiments, the computing devices and/or the examples of computing components of the computer-based system 700 may be configured to manage large numbers of members and/or concurrent transactions or electronic activities, as detailed herein.
Col 9 line 52-65, In some embodiments, referring to FIG. 7, members 701, 703, and 705 ( e.g., clients) of the computer-based system 700 may include virtually any computing device capable of receiving and sending a message over a network ( e.g., cloud network), such as network 707, to and from another computing device, such as server implementing the asset prediction system 100, and an activity profile server 303 and the like. In some embodiments a server can implement the asset prediction system 100 discussed above with reference to FIG. 1. In some embodiments asset prediction system 100 can be part of a financial institution system, merchant system, online store system, or other suitable entity capable of registering historical data associated with a user, group of users and/or non-person entity [receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values,].
Col 10 line 1-4, member devices 701, 703, and 705 can be used to submit (for example by user 711) personal identifiable information to the asset prediction system 100 and select an optimized software object [wherein each user profile includes values for a first set of features;]);
using values for the first set of features from the first plurality of user profiles and the corresponding plurality of resource availability values, training a first machine learning model to determine resource availability for a user system, wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value (Nagarajan Col 4 line 24-47,  FIG. 3. Such information can include user name, user address, information related with a user social security number, user income, or other suitable user information. In response, the user activity profile server 305 sends a user activity profile associated with the user profile of a user to the asset prediction system 100. Such a user activity profile or user profile can include raw historical variables values associated with the user or non-person entity. In some instances, the asset prediction system 100 sends the activity profile to the first machine learning model 203 [using values for the first set of features from the first plurality of user profiles and the corresponding plurality of resource availability values]. In some embodiments, the first machine learning model can be a categorization machine learning model that can predict multiple features of a user profile. In some embodiments, the first machine learning model 203 or categorization machine learning model can be a trained categorization or instance based machine learning model such as a k-nearest neighbor machine learning model, learning vector quantization machine learning model, self-organization map machine learning model, locally learning machine learning model, or another suitable categorization or instance based machine learning model. In some embodiments, the categorization machine learning model or instance based machine learning model can be trained, for example, based on historical data of multiple users included in multiple user profiles and/or activity profiles [, wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value].
Col 5 line 13,  For instance, when the classification machine learning model is implemented as a k-nearest neighbor machine learning model, the model can be trained with labeled feature vectors. The labeled feature vectors can be implemented as data structures that store values associated with activity data of multiple users, for example, data collected from multiple user profiles and/or user activity profiles. In some embodiments, the labeled feature vectors can include values associated with, for example, a payment history of a user, a balance-to-limit ratio of a user credit card, a length of time a user financial accounts associated has been opened, a user activity of a financial credit account, and other suitable data associated with a user profile or user activity profile.
In some embodiments, a classification machine learning model can receive a user activity profile and classify the user activity profile by assigning one or more classes to the user activity profile. In some embodiments, such classes can correspond to one or more features of a software object. In some embodiments, the features of the software objects can include a credit line pre-approved by a financial institution an annual percentage rate, a type of rewards program ( e.g. 20 airline miles, gift cards, and/or points corresponding to a monetary amount, a pre-approved membership program offered by an organization or institution, a pre-approved health insurance program and other suitable features or features [training a first machine learning model to determine resource availability for a user system]);
generating a set of categories based on the subset of features, [wherein each category in the set of categories corresponds to one or more textual prediction explanations for the output of the first machine learning model] (Nagarajan (Col 4, line 49-62, In some embodiments, the categorization machine learning model can be implemented as a k-nearest neighbor machine learning model that predicts the classification of a first aspect of the user profile. For instance, the output of a k-nearest neighbor machine learning model can be a class membership. In some instances, a user profile or user activity profile represented as data points can be classified by a plurality of votes of the neighboring data points, the activity profile or user profile can be then assigned to a class that has more commonalities among k-nearest neighbors. In some embodiments, the commonalities among k-nearest neighbors can be calculated, for example, by a distance metric such as an Euclidian distance, a Hemming distance, or other suitable type of distance metric [generating a set of categories based on the subset of features]);
and for a user profile processed using the first machine learning model to generate a corresponding resource availability value (Col 8 line18-24, FIG. 4 is a flow chart illustrative of examples of computations executed by an asset prediction system, in accordance with one or more embodiments of the present disclosure. In some instances, the asset prediction system 100 receives a user activity profile at 401. At 403, a categorization machine learning model can use the user activity profile to produce a first aspect of a user profile [for a user profile processed using the first machine learning model].), 
Merrill and Nagarajan are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Nagarajan to communicate with a plurality of user system and provided with resource availability value and categorizing users based on features. Doing so would allow the system to classify a user and provide them with a software object that includes credit line approvals from a financial intuition to and other resources (In some embodiments, a classification machine learning model can receive a user activity profile and classify the user activity profile by assigning one or more classes to the user activity profile. In some embodiments, such classes can correspond to one or more features of a software object. In some embodiments, the features of the software objects can include a credit line pre-approved by a financial institution,  e.g. 20 airline miles, gift cards, and/or points corresponding to a monetary amount, a pre-approved membership program offered by an organization or institution, a pre-approved health insurance program and other suitable features or features.).
Regarding claim 2 and analogous claim 11, Merrill teaches a method for generating a textual prediction explanation for an executed instruction, the method comprising (Merrill para 0079, In some embodiments, S270 includes generating explanation information for a score of a single test data point relative to the reference population by using at least one decomposition generated by the method 200, as described herein ( e.g., Adverse Action information, as described herein) (S271).
Para 0199, S271 can include functions generating explanation information for a test data point ( e.g., a test data point representing a credit applicant). The explanation information can include score explanation information that provides information that can be used to explain how a score was generated for the test data point… S271 can include using the decomposition of test data point relative to the reference population ( or observation that represents the reference population) to generate explanation information that explains how the model generated the score for the test data point. This explanation information can be used to generate an Adverse Action notice, as described herein. S271 can include providing the generated model score explanation information for the test data point to an external system ( e.g., an operator device, a user device, an applicant device, a modeling system, and the like [generating a textual prediction explanation for an executed instruction])):
processing a first machine learning model to extract an explainability vector (Merrill para 0030, In some embodiments, the model evaluation and explanation system (e.g., 120 of FIGS. lA and 1B) uses a non-differentiable model decomposition module (e.g., 121) to decompose scores generated by a model by computing at least one SHAP (SHapley Additive exPlanation) value [processing the first machine learning model]. In some embodiments, decomposing scores includes: for each feature of a test data point, generating a difference value, the difference value for the test data point relative to a corresponding reference data point, the difference value being the decomposition value for the feature. In some embodiments, generating a difference value for a feature includes: computing a SHAP value (as described herein) of the nondifferentiable model for the test data point and computing a SHAP value of the non-differentiable model for the corresponding reference data point, and subtracting the SHAP value for the reference data point from the SHAP value for the test data point to produce the difference value for the feature. In some embodiments, the score decomposition functions as explanation information for the model that explains the score for the test data point (generated by the model, or ensemble) in terms of the score for the reference data point (also generated by the model, or ensemble). In some embodiments, the decomposition is used to generate explanation information for the model. In some embodiments, these decompositions are generated for plural pairs of test data points and corresponding reference data points, and the decompositions are used to explain the model. In this manner, SHAP attributions for a single test data point are transformed into reference-based attributions to the test data point in terms of the reference data point
Para 0086, In some embodiments, each decomposition generated by the method 200 (e.g., at S230, S240, S250) is a vector of decomposition values d, for each feature used by the respective model. In some embodiments, each nondifferentiable decomposition value (generated at S230) is a difference between a SHAP (SHapley Additive exPlanation) value for a test data point ( e.g., a test data point representing a credit applicant, a test data point at a first point in time, etc.) of the test population and a SHAP value for a reference data point of the reference population (e.g., a reference data point representing an accepted credit applicant, a reference data point at a second point in time). [to extract an explainability vector]), 
using the explainability vector, selecting from the first set of features a subset of features having corresponding values in the explainability vector above a threshold (Merrill para 0086, In some embodiments, each decomposition generated by the method 200 (e.g., at S230, S240, S250) is a vector of decomposition values d, for each feature used by the respective model. In some embodiments, each nondifferentiable decomposition value (generated at S230) is a difference between a SHAP (SHapley Additive exPlanation) value for a test data point ( e.g., a test data point representing a credit applicant, a test data point at a first point in time, etc.) of the test population and a SHAP value for a reference data point of the reference population (e.g., a reference data point representing an accepted credit applicant, a reference data point at a second point in time).
para 0189, After performing decompositions for the protected class population relative to the reference population ( e.g., at one or more of S230, S240, S250), the decomposition(s) are used at S272 to generate disparate impact information.
para 0190, S272 can include identifying features having decomposition values [using the explainability vector,] (in the generated decompositions) above a threshold In some embodiments, the method 200 includes providing the identified features to an operator device (e.g., 171) via a network [selecting from the first set of features a subset of features having corresponding values in the explainability vector above a threshold].); 
[generating a set of categories based on the subset of features,] wherein each category in the set of categories corresponds to one or more textual prediction explanations for the output of the first machine learning model (Merrill para 0080, In some embodiments, S270 includes generating explanation information for a plurality of test data points (represented of a test population, e.g., a protected class) relative to the reference population (e.g., fairness information, Disparate Impact information, as described herein) by using at least one decomposition generated by the method 200, as described herein (S272) [wherein each category in the set of categories].
Para 272, S272 can include identifying features having decomposition values (in the generated decompositions) above a threshold. In some embodiments, the method 200 includes providing the identified features to an operator device (e.g., 171) via a network. In some embodiments, the method 200 includes displaying the identified features on a display device of an operator device (e.g., 171). In other embodiments, the method 20 includes displaying natural language explanations generated based on the decomposition described above [corresponds to one or more textual prediction explanations for the output of the first machine learning model;]);
generating a hash table including the set of categories, wherein the hash table is indexable using a hash value generated based on values for the subset of features for a user system (Merrill
Para 00124, In some embodiments, S230 includes: for each input data set (reference data point) of the reference population, using the non-differentiable model decomposition module to, for each input data set (reference data point) of the reference population, generate a decomposition of the evaluation input data set (x) (test data point) relative to the reference data point; … In some embodiments features with categorical values are encoded as numerics using a suitable method such as one-hot encoding or another mapping specified by the modeler.
Para 0181, S271 can include: the model evaluation system generating model score explanation information for the evaluation input data set (test data point selected at S220) based on at least one decomposition generated by the method 200 (e.g, at S230, S240, S250). In some embodiments, the model evaluation system generating model score explanation information for the evaluation input data set based on the decomposition includes: generating a lookup key by using the decomposition for the ensemble model score, and performing a data lookup by using the lookup key to retrieve the model score explanation information for the evaluation input data set. In some embodiments, the lookup is a database lookup. In some embodiments, the lookup is a hash table lookup [generating a hash table including the set of categories].
Para 0199 line 20-30, S271 can include using the decomposition of test data point relative to the reference population ( or observation that represents the reference population) to generate explanation information that explains how the model generated the score for the test data point. This explanation information can be used to generate an Adverse Action notice, as described herein. S271 can include providing the generated model score explanation information for the test data point to an external system ( e.g., an operator device, a user device, an applicant device, a modeling system, and the like) (Examiner Note: Hash tables are indexable by hash values as such at S271 and the explanation includes categories in the lookup.)); 
transmitting to a user system corresponding to the user profile a notification comprising a textual prediction explanation retrieved from the hash table using a hash value generated based on values of the subset of features from the user profile (Merrill para 0132, In some embodiments, the model evaluation system uses a decomposition generated for a model score to generate adverse action information (e.g., at S271) (as described herein) and provide the generated adverse action information to the operator device 171 [transmitting to a user system corresponding to the user profile a notification comprising a textual prediction explanation]. 
Para 0181, S271 can include: the model evaluation system generating model score explanation information for the evaluation input data set (test data point selected at S220) based on at least one decomposition generated by the method 200 (e.g, at S230, S240, S250). In some embodiments, the model evaluation system generating model score explanation information for the evaluation input data set based on the decomposition includes: generating a lookup key by using the decomposition for the ensemble model score, and performing a data lookup by using the lookup key to retrieve the model score explanation information for the evaluation input data set [retrieved from the hash table using a hash value generated based on values of the subset of features from the user profile].
Para 198 Adverse Action
Para 0199, S271 can include functions generating explanation information for a test data point ( e.g., a test data point representing a credit applicant). The explanation information can include score explanation information that provides information that can be used to explain how a score was generated for the test data point.).
However Merrill does not explicitly teach receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values, wherein each user profile includes values for a first set of features; 
wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value;
generating a set of categories based on the subset of features,
for a user profile processed using the first machine learning model to generate a corresponding resource availability value,
However Nagarajan teaches receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values, wherein each user profile includes values for a first set of features (Nagarajan Col 7 line 5-12, FIG. 3 illustrates a swim lane diagram with examples of communications between a client computing device, an asset prediction system, and a user activity profile server, in accordance with one or more embodiments of the present disclosure. In some instances, a user operating client computing device 303 can send a request with personal identifiable or demographic data 307 of the user to the asset prediction system 100.
Col 8 Some examples of the user data that can be inputted to the asset prediction system 100 can include first name 501, last name 503, street address 505, zip code 507, city 509, state 511, identifiable information of a user social security number, for example, the last four digits of the user social security number 513, user total annual income 515, and user nontaxable income 517. In some instances, after a user has entered the information shown on graphical user interface 500, the user can press the button 519 to view preapproved and optimized assets or software objects. It is noted, that the asset prediction system 100 can produce the pre-approved and optimized asset or software object in real-time or near real-time.
Col 9 line 34-45, FIG. 7 depicts a block diagram of an example of the computer-based system 700, in accordance with one or more embodiments of the present disclosure. However, not all these components may be required to practice one or more embodiments, and variations in the arrangement and type of the components may be made without departing from the spirit or scope of various embodiments of the present disclosure. In some embodiments, the computing devices and/or the examples of computing components of the computer-based system 700 may be configured to manage large numbers of members and/or concurrent transactions or electronic activities, as detailed herein.
Col 9 line 52-65, In some embodiments, referring to FIG. 7, members 701, 703, and 705 ( e.g., clients) of the computer-based system 700 may include virtually any computing device capable of receiving and sending a message over a network ( e.g., cloud network), such as network 707, to and from another computing device, such as server implementing the asset prediction system 100, and an activity profile server 303 and the like. In some embodiments a server can implement the asset prediction system 100 discussed above with reference to FIG. 1. In some embodiments asset prediction system 100 can be part of a financial institution system, merchant system, online store system, or other suitable entity capable of registering historical data associated with a user, group of users and/or non-person entity [receiving, for a first plurality of user systems, a first plurality of user profiles and a corresponding plurality of resource availability values,].
Col 10 line 1-4, member devices 701, 703, and 705 can be used to submit (for example by user 711) personal identifiable information to the asset prediction system 100 and select an optimized software object [wherein each user profile includes values for a first set of features;]); 
wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value (Nagarajan Col 4 line 24-47,  FIG. 3. Such information can include user name, user address, information related with a user social security number, user income, or other suitable user information. In response, the user activity profile server 305 sends a user activity profile associated with the user profile of a user to the asset prediction system 100. Such a user activity profile or user profile can include raw historical variables values associated with the user or non-person entity. In some instances, the asset prediction system 100 sends the activity profile to the first machine learning model 203. In some embodiments, the first machine learning model can be a categorization machine learning model that can predict multiple features of a user profile. In some embodiments, the first machine learning model 203 or categorization machine learning model can be a trained categorization or instance based machine learning model such as a k-nearest neighbor machine learning model, learning vector quantization machine learning model, self-organization map machine learning model, locally learning machine learning model, or another suitable categorization or instance based machine learning model. In some embodiments, the categorization machine learning model or instance based machine learning model can be trained, for example, based on historical data of multiple users included in multiple user profiles and/or activity profiles [, wherein the first machine learning model receives as input values for the first set of features and generates as output a corresponding resource availability value]);
generating a set of categories based on the subset of features (Nagarajan Col 4, line 49-62, In some embodiments, the categorization machine learning model can be implemented as a k-nearest neighbor machine learning model that predicts the classification of a first aspect of the user profile. For instance, the output of a k-nearest neighbor machine learning model can be a class membership. In some instances, a user profile or user 55 activity profile represented as data points can be classified by a plurality of votes of the neighboring data points, the activity profile or user profile can be then assigned to a class that has more commonalities among k-nearest neighbors. In some embodiments, the commonalities among k-nearest neighbors can be calculated, for example, by a distance metric such as an Euclidian distance, a Hemming distance, or other suitable type of distance metric [generating a set of categories based on the subset of features].),
for a user profile processed using the first machine learning model to generate a corresponding resource availability value (Nagarajan Col 8 line18-24, FIG. 4 is a flow chart illustrative of examples of computations executed by an asset prediction system, in accordance with one or more embodiments of the present disclosure. In some instances, the asset prediction system 100 receives a user activity profile at 401. At 403, a categorization machine learning model can use the user activity profile to produce a first aspect of a user profile [for a user profile processed using the first machine learning model].
Col 8 line 38-43, Such software objects can be optimized with respect to at least one competitive objective or interest between a user associated with the user profile and an entity associated with the software object. The asset prediction system 100 can then send a signal 409 to for example client computing device 303 discussed with reference to FIG. 3. Such a signal can include information for the user regarding more than one optimized software object developed specifically for the user. The asset prediction system 100 can then send a signal 409 to for example client computing device 303 discussed with reference to FIG. 3. Such a signal can include information for the user regarding more than one optimized software object developed specifically for the user.
Col 61-67, In some instances, after a user has entered the information shown on graphical user interface 500, the user can press the button 519 to view preapproved and optimized assets or software objects. It is noted, that the asset prediction system 100 can produce the pre-approved and optimized asset or software object in real-time or near real-time [to generate a corresponding resource availability value].)
Merrill and Nagarajan are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Nagarajan to communicate with a plurality of user system and provided with resource availability value and categorizing users based on features. Doing so would allow the system to classify a user and provide them with a software object that includes credit line approvals from a financial intuition to and other resources (In some embodiments, a classification machine learning model can receive a user activity profile and classify the user activity profile by assigning one or more classes to the user activity profile. In some embodiments, such classes can correspond to one or more features of a software object. In some embodiments, the features of the software objects can include a credit line pre-approved by a financial institution,  e.g. 20 airline miles, gift cards, and/or points corresponding to a monetary amount, a pre-approved membership program offered by an organization or institution, a pre-approved health insurance program and other suitable features or features.).
Regarding claim 12, Merrill and Nagarajan teach the method of claim 5 and analogous 15.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Claim(s) 3 and analogues 13 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and further in view of Peetermans, Emile. "Analysis of entity resolution techniques in academic and health data." (2022) (“Peetermans”).
Regarding claim 3 and analogous claim 13, Merrill and Nagarajan teach the method of claim 2 and analogous 11.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill does not explicitly teach wherein generating a hash table including the set of categories comprises: generating a transformation algorithm which encodes the subset of features into signatures in a real-valued vector space; using the transformation algorithm to encode feature values of the first plurality of user profiles into a plurality of signatures in the real-valued vector space; performing random permutations on the plurality of signatures to determine a plurality of approximate signatures; generating measures of similarity between approximate signatures for user profiles in the first plurality of user profiles; calculating a threshold for similarity in the real-valued vector space; using a clustering algorithm to identify groups of user profiles with measures of similarity for each pair of user profiles within the groups of user profiles exceeding the threshold for similarity; and assigning each user profile a hash value based on a group of user profiles closest to the user profile.
However Peetermans teaches wherein generating a hash table including the set of categories comprises: generating a transformation algorithm which encodes the subset of features into signatures in a real-valued vector space; 
using the transformation algorithm to encode feature values of the first plurality of user profiles into a plurality of signatures in the real-valued vector space (Peetermans 
page 7 2.3. Matching, 
Fig. 2.3. 
    PNG
    media_image1.png
    231
    683
    media_image1.png
    Greyscale

Locality Sensitive Hashing (LSH), an algorithmic technique that hashes similar input items into the same “buckets” with high probability, is another technique incorporating global information often used by blocking schemes [RU11]. Given a similarity function Sim : U ×U → [0, 1], an LSH scheme with a set of hashing functions H has the probability Prob h ∈ H[h(A) = h(B)] = Sim(A,B) for two objects A,B ∈ U [Cha02]. This characteristic allows for efficient similarity calculation without using a costly function. LSH for blocking works by first dividing every blocking key, for example, a person’s first name, in a set of shingles or n-grams. The name “Karel” becomes {“kar”,  “are”, “rel”} when n = 3. These small fixed-size parts of the string are used to create a vocabulary where each “shingle” represents an index. A string can thus be one-hot encoded into a vector with a length equal to the size of the vocabulary [using the transformation algorithm to encode feature values of the first plurality of user profiles into a plurality of signatures in the real-valued vector space]. The one hot-encoded vector is then hashed into a much smaller size while preserving similarities between pairs. These signatures are placed into b buckets in order to group possible duplicate pairs.
Page 20 Figure 3.7, 
    PNG
    media_image2.png
    638
    912
    media_image2.png
    Greyscale

[generating a transformation algorithm which encodes the subset of features into signatures in a real-valued vector space])); 
performing random permutations on the plurality of signatures to determine a plurality of approximate signatures; 
generating measures of similarity between approximate signatures for user profiles in the first plurality of user profiles (Peetermans Page 7 2.3 Matching, 2.3 Matching line 1-5, The goal of matching is to calculate the similarity between all pairs of profiles selected as candidate matches by the blocking step. This similarity score is based on a matching function. A similarity graph is created with edges identifying the probability that the connected profile nodes belong to the same entity. Matching schemes usually rely on the combination of similarity metrics calculated for each corresponding attribute pair [generating measures of similarity between approximate signatures for user profiles in the first plurality of user profiles;].
page 14 Dealing with imbalance para 1 line 1-6,
In order to address the class imbalance caused by the fact that duplicates are very rare in the dataset, over- and undersampling techniques have been explored. Imbalanced learn provides a set of algorithms for these kinds of problems. Oversampling attempts to handle imbalances by producing more samples for the underrepresented classes. This can be done naively by randomly sampling with replacement from the currently available data. More advanced techniques like SMOTE and ADASYN generate new samples by interpolation [Cha+02; He+08] [performing random permutations on the plurality of signatures to determine a plurality of approximate signatures].)); 
calculating a threshold for similarity in the real-valued vector space; 
using a clustering algorithm to identify groups of user profiles with measures of similarity for each pair of user profiles within the groups of user profiles exceeding the threshold for similarity; 
and assigning each user profile a hash value based on a group of user profiles closest to the user profile (Peetermans page 7 2.3. Matching, Locality Sensitive Hashing (LSH), an algorithmic technique that hashes similar input items into the same “buckets” with high probability, is another technique incorporating global information often used by blocking schemes [RU11]. Given a similarity function Sim : U ×U → [0, 1], an LSH scheme with a set of hashing functions H has the probability Prob h ∈ H[h(A) = h(B)] = Sim(A,B) for two objects A,B ∈ U [Cha02]. This characteristic allows for efficient similarity calculation without using a costly function. LSH for blocking works by first dividing every blocking key, for example, a person’s first name, in a set of shingles or n-grams. The name “Karel” becomes {“kar”,  “are”, “rel”} when n = 3. These small fixed-size parts of the string are used to create a vocabulary where each “shingle” represents an index. A string can thus be one-hot encoded into a vector with a length equal to the size of the vocabulary. The one hot-encoded vector is then hashed into a much smaller size while preserving similarities between pairs. These signatures are placed into b buckets in order to group possible duplicate pairs and assigning each user profile a hash value based on a group of user profiles closest to the user profile.
Page 8 Figure. 2.4, 
    PNG
    media_image3.png
    125
    697
    media_image3.png
    Greyscale
 [calculating a threshold for similarity in the real-valued vector space]
2.4 Clustering ,The output of the matching step can be seen as a similarity graph. A node is created for every profile where each pair of potentially matching profiles are connected by an edge depicting the similarity value or duplicate probability. A profile can have edges with multiple other profiles, indicating a possible entity consisting of three or more profiles. Clustering aims to convert this graph into a set of clusters or final entities. The most suitable clustering technique depends on the type of ER task. For record linkage, the graph is bipartite as there is a one-on-one mapping between the profiles in the two input data sources. An example of a clustering technique for these problems is called Unique Mapping Clustering, where edges are sorted in decreasing weight and iteratively. The profile pairs at the top of the list are considered duplicates if the similarity exceeds a threshold and none of the adjacent nodes has been matched yet [Lac+13] [using a clustering algorithm to identify groups of user profiles with measures of similarity for each pair of user profiles within the groups of user profiles exceeding the threshold for similarity;]).
Merrill and Peetermans are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Peetermans to group profiles into similar hashes and determine similarity. Doing so would allow feature engineering and convert vectors into similarity features (Peetermans page 3 Introduction para 5, The same basic process for building the ER application is followed in the record linkage and deduplication chapters. To train a model capable of discerning whether two sets of information share the same object, pairs of possible duplicate profiles need to be converted into vectors of similarity features. These values indicate the similarity of specific values or combinations of values in two profiles. For example, a similarity feature vector might include the result of an edit-distance metric comparing family names of persons. When many different metrics are used and the results are placed together in these vectors, they give a general numerical representation of similarity for a particular pair of profiles. These representations can then be used in typical machine learning models when supplemented with labels indicating the pair is a duplicate or a non-duplicate. In this study, many different kinds of similarity metrics will be explored and analysed in order to find the most potent combination of features for training an ER model. To this extent, any possible source or representation of data will be considered valuable until proven otherwise. This process, called “feature engineering” from this point, is one of the most crucial parts of this project. A sufficiently accurate entity resolution model might not be realised without a sufficiently large and influential set of similarity metrics.).
Claim(s) 4 and analogues 14 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and Peetermans further in view of 	4. Jun Wang, Wei Liu, Sanjiv Kumar, Shih-Fu Chang "Learning to Hash for Indexing Big Data" arXiv:1509.05472v1 [cs.LG] 17 Sep 2015 (“Wang”).
Regarding claim 4 and analogous claim 14 as best understood based on the 112(b) explained above, Merrill and Nagarajan teach the method of claim 3 and analogous 13.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill and Peetermans are combined in the same rationale as in claim 3 and analogous 13.
Peetermans further teaches retrieving the set of categories, wherein each category in the set of categories corresponds to one or more textual prediction explanations; 
training an associative model to correlate each hash value with a category in the set of categories (Peetermans (Page 7, A string can thus be one-hot encoded into a vector with a length equal to the size of the vocabulary. The one hot-encoded vector is then hashed into a much smaller size while preserving similarities between pairs. These signatures are placed into b buckets in order to group possible duplicate pairs.
Page 20, Figure 3.7, 
    PNG
    media_image4.png
    621
    881
    media_image4.png
    Greyscale

[retrieving the set of categories, wherein each category in the set of categories corresponds to one or more textual prediction explanations;])
Page 16 Figure 3.3, 
    PNG
    media_image5.png
    240
    867
    media_image5.png
    Greyscale
);
Wang teaches and generating a hash table such that each hash value corresponds to a category with a highest correlation between the hash value and the category (Wang Page 15 A. Hyperplane Hashing, 

    PNG
    media_image6.png
    313
    493
    media_image6.png
    Greyscale
 [and generating a hash table]
Page 15 A. Hyperplane Hashing para 6, Jain et al. [117] devised two different families of randomized hash functions to attack the hyperplane hashing problem. The first one is Angle-Hyperplane Hash (AHHash) A, of which one instance function is … where z P Rd represents an input vector, and u and v are both drawn independently from a standard d-variate Gaussian, i.e., u,v s Np0,Idˆdq. Note that hA is a two-bit hash function which leads to the probability of collision for a hyperplane normal w and a database point x: … This probability monotonically decreases as the point-to hyperplane angle αx,w increases, ensuring angle-sensitive hashing [such that each hash value corresponds to a category with a highest correlation between the hash value and the category]).
Merrill and Wang are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Wang to generate a table that corresponds to a label. Doing so would allow SVM active learning and proved the most useful information for improving the model (Page 16, A. Hyperplane Hashing para 2, In SVM-based active learning [115], the well proven sample selection strategy is to search in the unlabeled sample pool to identify the sample closest to the current hyperplane decision boundary, thus providing the most useful information for improving the learning model. When making such active learning scalable to gigantic databases, exhaustive search for the point nearest to the hyperplane is not efficient for the online sample selection requirement. Hence, novel hashing methods that can principally handle hyperplane queries are called for. A conceptual diagram using hyperplane hashing to scale up active learning process is demonstrated in Figure 11.).
Claim(s) 5 and analogues 15 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and further in view of Cheng et al. (US20220405623A1) (“Cheng”).
Regarding claim 5 and analogous claim 15, Merrill and Nagarajan teach the method of claim 2 and analogous 11.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill teaches wherein processing the first machine learning model to extract the explainability vector comprises: retrieving a first set of parameters for the first machine learning model (Merrill Para 0153, In some embodiments, when applied to adverse action, the model evaluation system 120 evaluates a specific denied credit applicant. In this embodiment the specific denied credit applicant comprises the test set ( test data point) (selected at S220), and a representative group comprises the reference set (reference data points) (selected at S210). In some embodiments, the representative group is comprised of applicants who were "barely approved" ( e.g., the bottom 10% of an approved population), according to their credit score. In other embodiments, the representative group will be comprised of those applicants who were "approved" ( e.g., all of the approved population), according to their credit score. In other embodiments, the representative group will be comprised of the "top approved" applicants ( e.g., the best credit applicants), according to their credit score. In some embodiments the credit score is computed by a model. In other embodiments the credit score is computed by a machine learning model [retrieving a first set of parameters for the first machine learning model;]); 
	and applying the attribution technique to the first set of parameters to generate the explainability vector corresponding to the first set of features (Merrill page 0086, In some embodiments, each decomposition generated by the method 200 (e.g., at S230, S240, S250) is a vector of decomposition values d, for each feature used by the respective model [to generate the explainability vector corresponding to the first set of features]. In some embodiments, each nondifferentiable decomposition value (generated at S230) is a difference between a SHAP (SHapley Additive exPlanation) value for a test data point ( e.g., a test data point representing a credit applicant, a test data point at a first point in time, etc.) of the test population and a SHAP value for a reference data point of the reference population (e.g., a reference data point representing an accepted credit applicant, a reference data point at a second point in time [and applying the attribution technique to the first set of parameters])).
	Merrill does not explicitly teach selecting an attribution technique based on the first set of parameters and the first plurality of user profiles; 
	However Cheng does teach selecting an attribution technique based on the first set of parameters and the first plurality of user profiles (Cheng Para 0042, The explanation engine 130 can be configured to generate different model explanation data based on the type of machine learning model specified by received input, e.g., as one or more query statements. The model explanation data can include feature attributions, which as described herein the explanation engine 130 can generate to different levels of granularity. The explanation engine 130 can generate feature attributions according to a calculated baseline score, which acts as a basis for comparing the effect different features have on a model's output.
Para 0045, The explanation engine 130 can also process input data and machine learning models according to one or more model-agnostic approaches, in which the architecture of the model does not matter to the model explainability approach applied. Example approaches include permutation feature importance, partial dependence plots [based on the first set of parameters and the first plurality of user profiles], Shapley values, SHAP (Shapley Additive Explanations), KemelSHAP, TreeSHAP, and integrated gradients. The explanation engine 130 can be configured to use some approaches over others depending on whether the explanation engine 130 is generating local or global explanations. For example, the explanation engine 130 may use permutation feature importance and partial dependence plots for generating global explanations, and Shapley values, SHAP, and integrated gradients for generating both local and global explanations [selecting an attribution technique]);
Merrill and Cheng are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Cheng to select a attribute technique. Doing to allow the model to provided different levels of granularity (Cheng para 0042, The explanation engine 130 can be configured to generate different model explanation data based on the type of machine learning model specified by received input, e.g., as one or more query statements. The model explanation data can include feature attributions, which as described herein the explanation engine 130 can generate to different levels of granularity. The explanation engine 130 can generate feature attributions according to a calculated baseline score, which acts as a basis for comparing the effect different features have on a model's output.).
Regarding claim 6 and analogous claim 16, Merrill and Nagarajan teach the method of claim 5 and analogous 15.
Merrill and Nagarajan are combined in the same rationale as in claim 5 and analogous 15.
Merrill further teaches wherein selecting a subset of features from the first set of features further comprises: receiving a user request specifying that one or more features be removed from consideration or that impact of the one or more features be reduced; 
calculating a threshold for removing features of the explainability vector (Merrill para 192, In some embodiments, the method 200 includes determining whether an identified feature is a permissible feature for generating a score for the protected class, and providing information identifying each impermissible feature that is identified to an operator device (e.g., 171). In some embodiments, identified features are presented to an operator for further review before the identified feature is determined to be a permissible feature for generating a score for the protected class. In other embodiments, identified features are automatically determined based on the impact to protected class approvals and the business impact of including the variable. In some embodiments an identified feature is determined permissible based on leaving the feature out [receiving a user request specifying that one or more features be removed from consideration], retraining the model, and determining its impact on the approval rate for a protected class. In other embodiments the determination is based on an approval rate difference threshold or other tunable parameters [calculating a threshold for removing features of the explainability vector]. In some embodiments, the method 200 includes displaying partial dependence plots for identified variables, heat maps, and other visualizations on a display device of an operator device (e.g., 171).); 
Cheng teaches and applying a mathematical transformation to the explainability vector such that values corresponding to the one or more features are adjusted ((Cheng 
Page 0049, The explanation engine 130 can aggregate the local attributions for N inputs in the input data, to generate a global attribution for the feature X, for example as follows:

    PNG
    media_image7.png
    117
    432
    media_image7.png
    Greyscale
 [and applying a mathematical transformation to the explainability vector]
Para 0118, The platform provides output predictions from trained machine learning models and feature attributions corresponding to the output prediction, according to block 540. The platform can generate feature attributions as described herein, with reference to FIGS. 1-3. At least a portion of the generated feature attributions can be stored as metadata corresponding to the model. As described in more detail with reference to FIG. 7, the platform can retrieve previously generated feature attributions and provide the feature attributions to a requesting user device.
Para 0119 line1-5, The platform determines whether it received input to retrain the machine learning model, according to diamond 550. The received input can be provided from a user device, specifying additional training data and/or the same training data selected using the one or more first query statements [such that values corresponding to the one or more features are adjusted]).
Claim(s) 7 and analogues 17 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and Cheng and further in view of Lipovetsky, S., & Conklin, W. M. (2015). Predictor relative importance and matching regression parameters. Journal of Applied Statistics, 42(5), 1017–1031 (“Lipovetsky”).
Regarding claim 7 and analogous claim 17, Merrill and Nagarajan teach the method of claim 5 and analogous 15.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill and Cheng are combined in the same rationale as in claim 5 and analogous 15.
Merrill does not explicitly teach wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a multivariate regression algorithm;  and the attribution technique applied to the set of parameters defining the first machine learning model is a Shapley Additive Explanation method.
However Lipovetsky teaches wherein: 
the first machine learning model is defined by a set of parameters comprising a matrix of weights for a multivariate regression algorithm; 
and the attribution technique applied to the set of parameters defining the first machine learning model is a Shapley Additive Explanation method (Lipovetsky Page 1019, Measures of derived importance, Let us briefly review some relationships of the OLS model. For the standardized variables, the model yi = b1xi1 +· · ·+bnxim + εi in matrix form is y = Xb + ε, (1) where X denotes the matrix of N by mth order with elements xik of ith observations (i = 1, . . . ,N – sample size) by jth predictors (j = 1 ,. . . ,m – number of predictors), yi and εi are the vectors of observations by the dependent variable and deviations from the theoretical model, respectively [the first machine learning model is defined by a set of parameters comprising a matrix of weights for a multivariate regression algorithm].
Page 1022, For numerical comparison, we use data from a real study conducted for a big pharmaceutical company about a cold sore healthcare product. The purchase interest as the dependent variable and 35 attributes as predictors were measured in a 10-point Likert scale, and data were gathered from 1023 respondents. Table 1 presents the names of attributes, together with the paired correlations of y with the x–s, the OLS model and its beta-coefficients, the net effects and their shares, and the Shapley value and their shares. We see that all correlations are positive, but because of multicollinearity more than a third of the predictors (13 out of 35) receive negative signs in the regression and have negative net effects, in spite of their evident usefulness. However, the SV net effects are all positive. S-plus software was used for estimations [and the attribution technique applied to the set of parameters defining the first machine learning model is a Shapley Additive Explanation method]).
Merrill and Lipovetsky are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Lipovetsky use Shapley values to show importance estimation. Doing to adjust the model to best data fit and to be meaningful and interpretable (Lipovesky Abstract line 4-12, These indices of importance are based on the orthonormal decomposition of the data matrix, and the work shows how to improve this approximation. Using predictor importance, the regression coefficients can also be adjusted to reach the best data fit and to be meaningful and interpretable. The results are compared with the robust to multicollinearity, but computationally difficult, Shapley value regression (SVR). They show that the JJ index is good for importance estimation, but the GJ index outperforms it if both predictor importance and coefficients of regression are needed; hence, this index (GJ) can be used in place of the more computationally intensive estimation by SVR. The results can be easily estimated by the considered approach that is very useful in practical regression modeling and analysis, especially for big data.). 
Claim(s) 8 and analogues 18 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and Cheng and further in view of Barberis, E.; Khoso, S.; Sica, A.; Falasca, M.; Gennari, A.; Dondero, F.; Afantitis, A.; Manfredi, M. Precision Medicine Approaches with Metabolomics and Artificial Intelligence. Int. J. Mol. Sci. 2022, 23, 11269 (“Barberis”) and Branka Hadji Misheva, Joerg Osterrieder, Ali Hirsa, Onkar Kulkarni, Stephen Fung Lin EXPLAINABLE AI IN CREDIT RISK MANAGEMENT arXiv:2103.00949v1 [q-fin.RM] 1 Mar 2021 (“Misheva”).
Regarding claim 8 and analogous claim 18, Merrill and Nagarajan teach the method of claim 5 and analogous 15.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill and Cheng are combined in the same rationale as in claim 5 and analogous 15.
Merrill does not explicitly teach wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a supervised classifier algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a Local Interpretable Model-agnostic Explanations method.
However Barberis teaches the first machine learning model is defined by a set of parameters comprising a matrix of weights for a supervised classifier algorithm ((Page 4.2 Support Vector Machine, 
4. Application of Machine Learning for the Diagnosis of Diseases
Since the turn of the century, there has been a marked increase in the number of studies on metabolomics that have made use of machine learning techniques. Many studies have shown that machine learning can discriminate between healthy and disease, groups as well as identify important biomarkers for use in clinical decision making in a variety of settings [42,43]. The following sections present the most recent applications of supervised machine learning for the diagnosis of diseases.
4.2. Support Vector Machine
Today, SVM classification is the most frequently used machine learning technique in precision medicine. SVM is a model that uses “support vectors” to construct a decision boundary (hyper-plane) in a high-dimensional feature space. Support vectors are data points that are positioned close to the hyperplane, and hence aid to optimize the hyperplane itself [50]. 
The objective of hyperplane is to maximize the distance between two classes, while placing as few data points as possible on the incorrect side of the decision boundary [51,52].
For a given training samples, a hyperplane is generated to maximize the distance,
which can be mathematically defined as: 
    PNG
    media_image8.png
    31
    131
    media_image8.png
    Greyscale

where W is the weight matrix, X represents the dataset and b is constant term [a matrix of weights for a supervised classifier algorithm].); 
Merrill and Barberis are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Barberis to use a supervised learning method. Doing to be able to classify non-linear data using SVM (Barberis page 7 4.2 Support Vector Machine para3, SVM can also be used to classify non-linear data through process called kernel trick. There are several types of kernel trick used for different problems such as the polynomial kernel, Gaussian kernel, Gaussian radial basis function (RBF), Laplace RBF, sigmoid kernel, hyperbolic tangent kernel, linear splines kernel in one dimension. Nonetheless, radial basis function (RBF) is the first choice among other kernels and it is also widely used for non-linear task in metabolomics.).
Misheva teaches and the attribution technique applied to the set of parameters defining the first machine learning model is a Local Interpretable Model-agnostic Explanations method (Misheva page 3, 2.1 LIME Locally Interpretable Model Agnostic Explanations is a post-hoc model-agnostic explanation technique which aims to approximate any black box machine learning model with a local, interpretable model to explain each individual prediction [1]. By model agnostic explanations, the authors suggest that it can be used for explaining any classifier, irrespective of the algorithm used for predictions as LIME is independent of the original classifier. Finally, LIME works locally which in essence means that it is observation specific and similarly as SHAP, will give explanations for every specific observation it has.
Page 8 4.2 LIME on SVM According to Vapnik et al.[7] the way a support-vector classifier works is, it constructs a hyperplane or set of hyperplanes in a high-dimensional space, which can be used for classification. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training-data point of any class, since in general the larger the margin, the lower the generalization error of the classifier. An SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier[8]. [a Local Interpretable Model-agnostic Explanations method]”)).
Merrill and Misheva are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Misheva to use a Locally Interpretable Model Agnostic Explanation. Doing so to interpret the model and explain each prediction (Misheva page 3, Locally Interpretable Model Agnostic Explanations is a post-hoc model-agnostic explanation technique which aims to approximate any black box machine learning model with a local, interpretable model to explain each individual prediction).
Claim(s) 9 and analogues 19 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and Cheng and further in view of M. B. Muhammad and M. Yeasin, "Eigen-CAM: Class Activation Map using Principal Components," 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1-7 (“Muhammad”).
Regarding claim 9 and analogous claim 19, Merrill and Nagarajan teach the method of claim 5 and analogous 15.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill and Cheng are combined in the same rationale as in claim 5 and analogous 15.
Merrill does not explicitly teach wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a convolutional neural network algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a Gradient Class Activation Mapping method.
However Muhammad teaches wherein: the first machine learning model is defined by a set of parameters comprising a matrix of weights for a convolutional neural network algorithm; and the attribution technique applied to the set of parameters defining the first machine learning model is a Gradient Class Activation Mapping method (Muhammad page 1 I. Introduction para 5, The proposed Eigen-CAM uses the principal components of the learned representations from the convolutional layers to create the visual explanations. The major contributions are:
We present a simple, intuitive method to obtain CAM based on convolutional layers output, and the process is independent of class relevance score.
We demonstrate that the proposed Eigen-CAM can robustly and reliably localize objects without the need to modify CNN architecture or even to backpropagate any computations, and at the same time, achieves higher performance compared to all previously reported methods such as Grad-CAM, CNN fixations [and the attribution technique applied to the set of parameters defining the first machine learning model is a Gradient Class Activation Mapping method.].
Page 3 III. Proposed Approach, Observation 2: CAM uses the last weight matrix between GAP and SoftMax to weight different feature maps. Similarly, in Grad-CAM and Grad-CAM++ derives the weights of the linear combination of different feature maps, based on backpropagated class relevance score. Radiant values determine the weight of each feature map to produce the class activation map [the first machine learning model is defined by a set of parameters comprising a matrix of weights for a convolutional neural network algorithm;].).
Merrill and Muhammad are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Muhammad to use Gradient Class Activation Mapping method on CNN. Doing so to provide a visual explanation for a CNN (Muhammad page 1 I. Introduction para 5, The proposed Eigen-CAM uses the principal components of the learned representations from the convolutional layers to create the visual explanations. The major contributions are:
We present a simple, intuitive method to obtain CAM based on convolutional layers output, and the process is independent of class relevance score.
We demonstrate that the proposed Eigen-CAM can robustly and reliably localize objects without the need to modify CNN architecture or even to backpropagate any computations, and at the same time, achieves higher performance compared to all previously reported methods such as Grad-CAM, CNN fixations).
Claim(s) 10 and analogues 20 are rejected under 35 U.S.C. 103 as being unpatentable over Merrill in view of Nagarajan and Cheng and further in view of Salazar, Sebastian, Samuel Denton, and Ansaf Salleb-Aouissi. "Counterfactual explanations for support vector machine models." arXiv preprint arXiv:2212.07432 (2022) (“Salazar”).
Regarding claim 10 and analogous claim 20, Merrill and Nagarajan teach the method of claim 5 and analogous 15.
Merrill and Nagarajan are combined in the same rationale as in claim 2 and analogous 11.
Merrill and Cheng are combined in the same rationale as in claim 5 and analogous 15.
Merrill does not explicitly teach wherein: the first machine learning model is defined by a set of parameters comprising a hyperplane matrix for a support vector machine algorithm;  and the attribution technique applied to the set of parameters defining the first machine learning model is a counterfactual explanation method.	
However Salazar teaches wherein: the first machine learning model is defined by a set of parameters comprising a hyperplane matrix for a support vector machine algorithm;  and the attribution technique applied to the set of parameters defining the first machine learning model is a counterfactual explanation method ((Salazar page 3, 3 Actionability with SVMs, 
    PNG
    media_image9.png
    116
    327
    media_image9.png
    Greyscale

Page 6 5.1 Toy example, To demonstrate our approach we simulate a separable 2-D dataset consisting of a mixture of two Gaussians. We run three simulations to qualitatively asses the effects of enforcing the plausibility and correlation constraints discussed in Sections 3 and 4. The results of these experiments are shown in Figures 1 and ??.
Figure 1, 
    PNG
    media_image10.png
    622
    582
    media_image10.png
    Greyscale
 [and the attribution technique applied to the set of parameters defining the first machine learning model is a counterfactual explanation method]).
Merrill and Salazar are considered to be analogous to the claim invention because they are in the same field of machine learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filling date of the claimed invention to have modified Merrill to incorporate the teachings of Salazar to user a counterfactual method with an SVM. Doing so to provide explanations for an SVM and give feedback on what features are most revel event (Salazar page 1 1 Introduction While there is extensive theoretical research on Support
Vector machines [4, 30], there is little to no information on how to use the wide-margin property to change the labels of instances with undesirable predictions and how this can be used to enhance interpretability. As an example, consider an instance with a predicted undesirable outcome (e.g., mortgage application rejected), how do we minimally change the features to flip the prediction of the original data? Explanations of this form give the decisionmaker feedback on what features are most relevant to the model in the decision-making process and are known as counterfactual explanations. Providing explanations of this form has become increasingly important, especially in cases where automated decision-making has the potential to drastically impact human lives [32]. Furthermore, legal regulations —like the European Union’s General Data Protection Regulations— are demanding responsible deployment of machine learning models. As such, it is important to ensure that these models are used responsibly and ethically in practice.). 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALFREDO CAMPOS whose telephone number is (571)272-4504. The examiner can normally be reached 7:00 - 4:00 pm M - F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J. Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ALFREDO CAMPOS/Examiner, Art Unit 2129                                                                                                                                                                                                        
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

May 10, 2023
Application Filed
Mar 02, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/528,305
Patent 12561407
ONE-PASS APPROACH TO AUTOMATED TIMESERIES FORECASTING
2y 5m to grant Granted Feb 24, 2026
17/558,355
Patent 12561559
Neural Network Training Method and Apparatus, Electronic Device, Medium and Program Product
2y 5m to grant Granted Feb 24, 2026
17/820,419
Patent 12554973
HIERARCHICAL DATA LABELING FOR MACHINE LEARNING USING SEMI-SUPERVISED MULTI-LEVEL LABELING FRAMEWORK
2y 5m to grant Granted Feb 17, 2026
17/938,431
Patent 12536260
SYSTEM, APPARATUS, AND METHOD FOR AUTOMATICALLY GENERATING NEGATIVE KEYSTROKE EXAMPLES AND TRAINING USER IDENTIFICATION MODELS BASED ON KEYSTROKE DYNAMICS
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 4 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+33.3%)
3y 9m
Median Time to Grant
Low
PTA Risk
Based on 6 resolved cases by this examiner. Grant probability derived from career allow rate.