DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment/Arguments
1. The amendment to the specification overcomes the objection to the drawings.
2. Applicant’s argument to the rejection under 35 U.S.C. 101 filed on December 16, 2025 have been fully considered but they are not persuasive. Applicant asserts on pages 13-15 of the Remarks that the amended claims are statutory under Step 2A, Prong Two because they integrate the alleged judicial exception into a practical application. In particular, Applicant argues that the claims are directed to a specific use of transfer learning across tenant entity data sets in particular context involving low-frequency anomalous data (e.g., fraud detection), and that such limitations provide technological improvements including improved feature extraction, improved model accuracy, and reduced false positives.
However, Applicant’s arguments are not commensurate with the scope of the claims. While Applicant characterizes the claims as providing improvements to machine learning technology, the claimed limitations, under the broadest reasonable interpretation, recite generic data analysis operations including scoring features, generated weighted scores, comparing representations, determining similarity, ranking, and selecting features using machine learning models and explainer algorithms. These operations correspond to analyzing data and producing improved analytical results, which fall within the abstract idea itself.
Applicant’s argument that the claims are limited to a specific use case (e.g., transfer learning across tenant datasets and low-frequency fraud scenarios) is also not persuasive. Limiting an abstract idea to a particular data environment or field of use does not integrate the judicial exception into a practical application. See MPEP 2106.05(h). The claimed use of particular types of data merely describes the context in which the abstract idea is applied and does not impose a meaningful limit on the claim.
Further, Applicant’s alleged improvements – such as improved feature selection, improved model training, and reduced false positives – are improvements to the results of the data analysis rather than to the functioning of a computer or another technology or technical field. See MPEP 2106.04(d)(1) and 2106.05(a). An alleged improvement to the selection of features or the accuracy of a machine learning model is an improvement to the abstract idea itself, not to computer technology. As explained in case law, gathering and analyzing information using conventional techniques and displaying a result does not constitute a technological improvement. See TLI Communications, 823 F.3d at 612-13. Here, the claims merely use generic machine learning models and explainer algorithms to analyze data and produce improved analytical results, which does not improve computer functionality or any other technology.
Additionally, although Applicant refers to problems described in the specification (e.g., difficulties in training models with low frequency anomaly data), the claims do not recite a specific technical solution that improves computer technology. Rather, the claims broadly recite using known machine learning techniques (e.g., transfer learning, feature importance scoring, and similarity comparison) to address those issues. These claims therefore describe the use of conventional techniques to improve analytical outcomes, which does not constitute a technological improvement. See MPEP 2106.05(a).
Moreover, the claims do not reflect a particular implementation that improves technology, but instead recites results-oriented functional language without specifying how these results are achieved in a technologically meaningful way. Thus, the claims amount to instructions to apply the abstract idea using generic computing components and known machine learning techniques. See MPEP 2106.05(f).
Accordingly, the claims do not integrate the judicial exception into a practical application under Step 2A, Prong Two.
For similar reasons, the additional elements do not amount to significantly more than the judicial exception under Step 2B. The recited machine learning models, explainer processes, and data processing steps are generic and perform well-understood, routine, and conventional functions of analyzing and processing data. There is no indication in the claims that the elements are implemented in an unconventional manner or that they provide an inventive concept beyond the abstract idea itself.
Therefore, the rejection of claims 1-20 under 35 U.S.C. 101 is maintained.
3. Applicant’s arguments filed on December 16, 2025 regarding the rejections under 35 U.S.C. 102 and 35 U.S.C 103 have been fully considered but are not persuasive. Applicant’s arguments regarding the rejection of claims 1-3, 7, 9-11, 15 and 17-19 under 35 U.S.C. 102(a)(1) as being anticipated by Shachar have been fully considered but are not persuasive.
Applicant argues that the rejection under 35 U.S.C. 102 is overcome. In particular, Applicant asserts that Shachar does not disclose the newly added limitations of amended claim 1, including limitations directed to computing feature scores using an ML model explainer process, comparing explanations across tenant data systems, converting weighted scores into vectors, computing similarity in an n-dimensional space, training a second ML model based on selected features, and iteratively updating feature selection.
However, these arguments are moot. The rejection under 35 U.S.C 102 has been withdrawn and replaced with a rejection under 35 U.S.C. 103 to account for the newly added limitations in the independent claims. As acknowledged by Applicant, the amended claims include additional features that were not previously addressed under the anticipation rejection, and the current rejection addresses these limitations through a combination of references.
To the extent that Shachar does not alone explicitly disclose certain of the argued limitations, the present rejection does not rely on Shachar in isolation. Rather, Shachar is relied upon for teaching aspects such as feature importance determination, model explanation (e.g., SHAP), feature selection, and model training and retraining (see, e.g., Shachar, paragraphs [0043]-[0048]), while Balakrishnan is relied upon for teaching converting data representations into vectors, computing similarity scores in an n-dimensional space, and determining similarity using such vector representations.
Thus, the limitations identified by Applicant as allegedly missing from Shachar are addressed by the combined teachings of Shachar and Balakrishnan under the broadest reasonable interpretation of the claims. The rejection is based on the combination of references, not on Shachar alone.
Accordingly, Applicant’s arguments directed to whether Shachar alone discloses the newly added limitations of amended claims 1, 9, and 17 do not address the current rejection under 35 U.S.C. 103, which relies on the combined teachings of Shachar and Balakrishnan.
Applicant’s arguments regarding the rejection of claims 4, 12, and 20 over Shachar in view of Archambeau, claims 5,6,13, and 14 over Shachar in view of Poole, and claims 8 and 16 over Shachar in view of Balakrishnan have been fully considered but are not persuasive.
Applicant asserts that Shachar fails to teach or suggest the limitations of independent claims 1, 9, and 17, and that the secondary references do not remedy these alleged deficiencies. However, these arguments are not persuasive because they are premised on alleged deficiencies of Shachar in isolation and do not address the rejection as presented, which relies on the combined teachings of the references.
In particular, Applicant does not present arguments directed to the specific teachings relied upon in the secondary references, does not address the Examiner’s articulated reasoning for combining the references, and does not explain why the combined teachings would fail to render the claimed subject matter obvious under the broadest reasonable interpretation of the claims. Instead, Applicant’s arguments merely restate the position that Shachar alone does not disclose the claimed limitations.
As set forth in the rejection, Shachar is relied upon in combination with Archambeau, Poole, and Balakrishnan to teach the full scope of the claimed limitations, including the newly added limitations of independent claims 1,9, and 17. The rejection therefore does not depend on Shachar alone to disclose all the limitations.
Accordingly, because Applicant’s arguments do not address the combined teachings of the cited references or the rationale for their combination, they do not identify reversible error in the rejection. Therefore, the rejection of claims 4-6, 8, 12-14, 16, and 20 under 35 U.S.C. 103 is maintained.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial
exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.
101 Subject Matter Eligibility Analysis
Step 1: Claims 1-20 are within the four statutory categories (a process, machine, manufacture or composition of matter).
Claims 1-8 and 17-20 are directed to processors and storage mediums which are
machines. Claims 9-16 are directed to a method consisting of a series of steps, meaning that it is directed to the statutory category of process.
Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
None of the claims represent an improvement to technology.
Regarding claim 1, the following claim elements are abstract ideas:
determining whether the first data set meets or exceeds a low fraud tenant threshold; (This is an abstract idea of a “mental process.” The step of “determining whether the first data set meets or exceeds a low fraud tenant threshold” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person can review the data mentally or with pen and paper and decide whether it surpasses a certain threshold. As such, this limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).)
responsive to the determining, segmenting the first tenant data system into a tenant segment group based on the first data set; (This is an abstract idea of a “mental process.” The step of “segmenting the first tenant data system into a tenant segment group based on the first data set,” responsive to the determining step, can under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person can mentally or with pen and paper sort or group records based on the data to create segments. As such, the limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).)
determining first features of a first ML model based on at least a portion of the first data set and an ML model algorithm for the first ML model; (This is an abstract idea of a “mental process.” The step of “determining first features of a first ML model based on at least a portion of the first data set and an ML model algorithm for the first ML model” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person could review the data and, by following an algorithmic set of rules, identify relevant features for building a model, either mentally or with pen and paper. See MPEP 2106.04(a)(2)(III). Additionally, an algorithm, as recited here, is considered a mathematical concept because it represents a mathematical relationship, calculation, or set of rules for processing data, as explained in MPEP 2106.04(a)(2)(I).)
determining a first explanation of first feature importance of each of the first features of the first ML model, wherein the first explanation comprises first weighted scores corresponding to the score for each of the first features (This is an abstract idea of a “mental process.” The limitation recites evaluating feature importance and assigning corresponding scores. A person could review factors, assess the relative importance of each feature to an outcome, and assign corresponding values based on observation and judgement. Such evaluation and scoring can be performed in the human mind or with the aid of pen and paper.)
comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance associated with the first features and a second data set for the second tenant data system; (This is an abstract idea of a “mental process.” The limitation recites comparing data based on feature importance information. A person could review two sets of data, consider feature importance, and determine similarities or differences using observation and judgement. Such comparison can be performed in the human mind or with the aid of pen and a paper, and therefore falls within the mental process grouping of abstract ideas.).
ranking at least the first features based on the similarity scores; (This is an abstract idea of a “mental process.” The limitation recites ordering features based on similarity scores. A person could review similarity values and rank features accordingly using observation and judgement. Such ordering and evaluation can be performed in the human mind or with the aid of pen and a paper.)
selecting a set of the first features for training a second ML model using a feature selection for the second ML model based on the ranking. (This is an abstract idea of a “mental process.” The limitation recites selecting features based on ranking. A person could review a ranked list of features and choose a subset based on observation and judgement. Such selection can be performed in the human mind or with the aid of pen and paper, and therefore falls within the mental process grouping of abstract ideas.)
computing a score for each of the first features…to determine an amount that each of the first features contributes to outputs by the first ML model and compute the score based on the amount (This is an abstract idea of a mental process and mathematical concept. The limitation recites determining numerical contribution amounts for features and computing scores based on those amounts, which constitutes mathematical calculations. Additionally, it involves evaluating how much each feature contributes to an outcome and assigning a corresponding score based on the evaluation. A person could assess contributing factors, estimate their relative impact, and assign values accordingly using observation and judgement.);
converting the first and second weighted scores to vectors in an
n
-dimensional space (This is an abstract idea of a mathematical concept. The limitation recites representing numerical values as vectors in an n-dimensional space, which constitutes a mathematical representation and mathematical calculations involving transforming data into a vector space.),
computing similarity scores in the n-dimensional space using the vectors, wherein the similarity scores indicate how similar the first weighted scores are to the second weighted scores (This is an abstract idea of a mathematical concept. The limitation recites computing similarity scores based on vectors in an n-dimensional space, which constitutes mathematical calculations involving comparing numerical values and determining a similarity measure.);
determining that a threshold similarity exists between the first tenant data system and the second tenant data system (This is an abstract idea of a mental process. The limitation recites evaluating whether a similarity meets a threshold. A person could compare a similarity value to a predetermined threshold and determine whether the threshold is satisfied using observation and judgement. Such evaluation can be performed in the human mind or with the aid of pen and paper or a calculator, and therefore falls within the mental process grouping of abstract ideas.)
determining whether to iterate over the feature selection for inclusion of at least one additional feature in the set of the first features for retraining the second ML model (This is an abstract idea of a mental process. The limitation recites evaluating whether to repeat a selection process and include additional features. A person could review selected features, decide whether additional features should be included, and determine whether to repeat the selection using observation and judgement.)
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
a processor (This a high-level recitation of generic computer components for performing the abstract idea. See MPEP 2106.05(f).)
computer readable medium (The claimed elements represent generic computer components and computer instructions to apply the abstract idea. Therefore, these elements are sufficient to integrate the judicial exception. See MPEP 2106.05(f).) comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform ML modeling operations.
receiving a first data set for a first tenant data system usable for training a first machine learning model to detect fraud in the tenant data systems (This limitation amounts to adding insignificant extra-solution activity to a judicial exception, as discussed in MPEP 2106.05(g). Receiving a first data set (i.e., mere data gathering in conjunction with the abstract idea) is directed to a well understood routine conventional activity data transmission see 2106.05(d)(II)(i).);
using an ML model explainer process, wherein the ML model explainer process is configured (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).)
wherein the second explanation comprises second weighted scores (This amounts to insignificant extra-solution activity. The limitation merely expresses data as weight scores, i.e., presenting the results of the mathematical calculations in a particular format.),
using at least the ML model explainer process (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).)
training the second ML model based on the set of the first features (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).);
Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following abstract idea:
wherein the wherein the first weighted scores and the second weighted scores comprise SHapley Additive exPlanations (SHAP) values (Shapley values) generated from a SHAP algorithm. (This is an abstract idea of a “mathematical concept.” SHAP values are mathematical results produced by a mathematical algorithm, involving statistical calculations. This limitation is directed to a mathematical concept, see MPEP 2106.04(a)(2)(I).
Regarding claim 3, the rejection of claim 2 is incorporated herein. Further, claim 3 recites the following abstract idea:
determining, using the score for each of the first features from an overall set of features associated with the first ML model the Shapley values of the first features to the first ML model based on (This is an abstract idea of a “mental process.” The limitation recites evaluating feature scores and determining contribution values for features relative to an overall set. A person could review feature scores, assess the contribution of each feature to an outcome, and assign corresponding values based on observation and judgement. Such evaluation and determination can be performed in the human mind or with the aid of pen and paper or a calculator, and therefore falls within the mental process grouping of abstract ideas.)
Regarding claim 4, the rejection of claim 1 is incorporated herein. Further, claim 4 recites the following abstract idea:
converting the first explanation and the second explanation to a global explanation standard utilized with a plurality of ML models including the first ML model and the second ML model. (The step of converting the first explanation and the second explanation to a global explanation standard utilized with a plurality of models, including the first and second predictive models, can be performed in the human mind or with pen and paper by applying observation, judgement, and basic standardization or translation methods. Accordingly, this step amounts to a mental process under the abstract idea exception, see MPEP 2106.04(a)(2)(III).)
Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following abstract idea:
determining a fraud count of transactional frauds in the first data set; (This is an abstract idea of a “mental concept.” The steps of determining a fraud count of transactional frauds in the first data set involves reviewing transaction records, identifying which transactions are fraudulent, and tallying the total – steps that can be performed in the human mind or with pen and paper using observation, judgement, and basic counting. Accordingly, this constitutes a mental process under the abstract idea exception, see MPEP 2106.04(a)(2)(III).)
determining that the fraud count of the transactional frauds meets or exceeds the low fraud tenant threshold; (This is an abstract idea of a “mental concept.” The steps of determining that the fraud count of the transactional frauds meets or exceeds the low fraud threshold involves comparing the identified fraud count to a predetermined threshold – an activity that can be performed in the human mind or with pen and paper using observation and basic comparison. Accordingly, this constitutes a mental process under the abstract idea exception, see MPEP 2106.04(a)(2)(III).)
determining that the first tenant data system is not associated with a low fraudulent financial institution based on the determining that the fraud count of the transactional frauds meets or exceeds the low fraud tenant threshold. (This is an abstract idea of a “mental concept.” The step of determining that the first tenant data system is not associated with a low fraudulent financial institution, based on determining the fraud count meets or exceeds the low fraud threshold, involves drawing a conclusion from the comparison between the fraud count and the threshold. This reasoning and decision-making can be performed in the human mind or with pen and paper using observation and logical judgement. Accordingly, this constitutes a mental process under the abstract idea exception, see MPEP 2106.04(a)(2)(III).)
Regarding claim 6, the rejection of claim 5 is incorporated herein. Further, claim 6 recites the following abstract idea:
performing at least one fraud enrichment operation on the first data set, wherein the at least one fraud enrichment operation causes one or more transactions in the first data set to convert from a non-fraudulent transaction to a fraudulent transaction. (This is an abstract idea of a “mental concept.” The step of performing at least one fraud enrichment operation on the first data set, where one or more transactions are converted from non-fraudulent to fraudulent, involves reviewing transaction data, applying criteria, and manually changing the classification of certain transactions. These actions – reviewing, evaluating, and reclassifying records – can be performed in the human mind with pen and paper, simply by marking and updating data entries. Accordingly, this constitutes a mental process under the abstract idea exception, see MPEP 2106.04(a)(2)(III).)
Regarding claim 7, the rejection of claim 1 is incorporated herein. Further, claim 7 recites the following abstract idea:
determining a training data set and a testing data set for the first ML model based on the first data set, wherein the first data set is separate from a second data set (This is an abstract idea of a “mental concept.” The step of determining a training data set and a testing data set for the first model based on the first data set, where the first data set is separate from a second data set, involves reviewing, selecting, and dividing data into subsets for different purposes. These actions – analyzing, sorting, and assigning records – are performed using mental reasoning and judgement, whether entirely in the human mind or by using pen and paper to manually list and categorize data entries. Accordingly, this constitutes a mental process under the abstract idea exception, see MPEP 2106.04(a)(2)(III).);
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
training the first ML model using the training data set; (The claimed elements represent generic computer components and computer instructions to apply the abstract idea. Therefore, these elements are sufficient to integrate the judicial exception. See MPEP 2106.05(f).)
testing the first ML model using the testing data set. (The claimed elements represent generic computer components and computer instructions to apply the abstract idea. Therefore, these elements are sufficient to integrate the judicial exception. See MPEP 2106.05(f).)
Regarding claim 8, the rejection of claim 1 is incorporated herein. Further, claim 8 recites the following abstract idea:
wherein computing the similarity score utilizes a cosine similarity between the first and second vectors generated (This is an abstract idea of a “mathematical concept.” The limitation recites computing a cosine similarity between vectors, which constitutes a mathematical calculation involving vector operations and similarity measures. See MPEP 2106.04(a)(2)(I).).
Regarding claim 9, the following claim elements are abstract ideas:
determining whether the first data set meets or exceeds a low fraud tenant threshold; (This is an abstract idea of a “mental process.” The step of “determining whether the first data set meets or exceeds a low fraud tenant threshold” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person can review the data mentally or with pen and paper and decide whether it surpasses a certain threshold. As such, this limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).)
responsive to the determining, segmenting the first tenant data system into a tenant segment group based on the first data set; (This is an abstract idea of a “mental process.” The step of “segmenting the first tenant data system into a tenant segment group based on the first data set,” responsive to the determining step, can under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person can mentally or with pen and paper sort or group records based on the data to create segments. As such, the limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).)
determining first features of a first ML model based on at least a portion of the first data set and an ML model algorithm for the first ML model; (This is an abstract idea of a “mental process.” The step of “determining first features of a first ML model based on at least a portion of the first data set and an ML model algorithm for the first ML model” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person could review the data and, by following an algorithmic set of rules, identify relevant features for building a model, either mentally or with pen and paper. See MPEP 2106.04(a)(2)(III). Additionally, an algorithm, as recited here, is considered a mathematical concept because it represents a mathematical relationship, calculation, or set of rules for processing data, as explained in MPEP 2106.04(a)(2)(I).)
determining a first explanation of the first feature importance of each of the first features of the first ML model wherein the first explanation comprises first weighted scores corresponding to the score for each of the first features (This is an abstract idea of a “mental process.” The limitation recites evaluating feature importance and assigning corresponding scores. A person could review factors, assess the relative importance of each feature to an outcome, and assign corresponding values based on observation and judgement. Such evaluation and scoring can be performed in the human mind or with the aid of pen and paper.)
comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance associated with the first features and a second data set of each of second features for the second tenant data system (This is an abstract idea of a “mental process.” The step of “comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance of each of the second features for the second tenant data system” can, under the broadest reasonable interpretation, be practically be performed in the human mind using observation and judgement. For example, a person can review the first feature importance explanations for both systems – where a feature importance explanation simply describes which factors are most influential in making decisions – analyze similarities or differences, and make a comparison, either mentally or with pen and paper. As such, the limitation qualifies are a mental process. See MPEP 2106.04(a)(2)(III).).
ranking at least the first features based on the similarity scores (This is an abstract idea of a “mental process.” The limitation recites ordering features based on similarity scores. A person could review similarity values and rank features accordingly using observation and judgement. Such ordering and evaluation can be performed in the human mind or with the aid of pen and a paper.)
selecting a set of the first features for training a second ML model using a feature selection for second ML model based on the ranking. (This is an abstract idea of a “mental process.” The step of performing a feature selection for at least one of the first ML, model or second ML model based on ranking” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, after ranking the features, a person can review the list and select certain features to use in a model, either mentally or with pen and paper. As such, this limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).
computing a score for each of the first features…to determine an amount that each of the first features contributes to outputs by the first ML model and compute the score based on the amount (This is an abstract idea of a mental process and mathematical concept. The limitation recites determining numerical contribution amounts for features and computing scores based on those amounts, which constitutes mathematical calculations. Additionally, it involves evaluating how much each feature contributes to an outcome and assigning a corresponding score based on the evaluation. A person could assess contributing factors, estimate their relative impact, and assign values accordingly using observation and judgement.);
converting the first and second weighted scores to vectors in an n-dimensional space converting the first and second weighted scores to vectors in an
n
-dimensional space (This is an abstract idea of a mathematical concept. The limitation recites representing numerical values as vectors in an n-dimensional space, which constitutes a mathematical representation and mathematical calculations involving transforming data into a vector space.),
computing similarity scores in the n-dimensional space using the vectors, wherein the similarity scores indicate how similar the first weighted scores are to the second weighted scores (This is an abstract idea of a mathematical concept. The limitation recites computing similarity scores based on vectors in an n-dimensional space, which constitutes mathematical calculations involving comparing numerical values and determining a similarity measure.);,
determining that a threshold similarity exists between the first tenant data system and the second tenant data system based on the comparing (This is an abstract idea of a mental process. The limitation recites evaluating whether a similarity meets a threshold. A person could compare a similarity value to a predetermined threshold and determine whether the threshold is satisfied using observation and judgement. Such evaluation can be performed in the human mind or with the aid of pen and paper or a calculator, and therefore falls within the mental process grouping of abstract ideas.);
determining whether to iterate over the feature selection for inclusion of at least one additional feature in the set of the first features for retraining the second ML model (This is an abstract idea of a mental process. The limitation recites evaluating whether to repeat a selection process and include additional features. A person could review selected features, decide whether additional features should be included, and determine whether to repeat the selection using observation and judgement.).
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
receiving a first data set for a first tenant data system usable for training a first machine learning model to detect fraud in the tenant data systems (This limitation amounts to adding insignificant extra-solution activity to a judicial exception, as discussed in MPEP 2106.05(g). Receiving a first data set (i.e., mere data gathering in conjunction with the abstract idea) is directed to a well understood routine conventional activity data transmission see 2106.05(d)(II)(i).);
wherein the second explanation comprises second weighted scores (This amounts to insignificant extra-solution activity.),
training the second ML model based on the set of the first features (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).);
using at least the ML model explainer process(This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).).
Regarding claim 10, the rejection of claim 9 is incorporated herein. The claim recites similar limitations as corresponding to claim 2. Therefore, the same subject matter analysis was utilized for claim 2, as described above, is equally applicable to claim 10.
Therefore claim 10, is ineligible.
Regarding claim 11, the rejection of claim 10 is incorporated herein. The claim recites similar limitations as corresponding to claim 3. Therefore, the same subject matter analysis was utilized for claim 3, as described above, is equally applicable to claim 11.
Therefore claim 11, is ineligible.
Regarding claim 12, the rejection of claim 9 is incorporated herein. The claim recites similar limitations as corresponding to claim 4. Therefore, the same subject matter analysis was utilized for claim 4, as described above, is equally applicable to claim 12.
Therefore claim 12, is ineligible.
Regarding claim 13, the rejection of claim 9 is incorporated herein. The claim recites similar limitations as corresponding to claim 5. Therefore, the same subject matter analysis was utilized for claim 5, as described above, is equally applicable to claim 13.
Therefore claim 13, is ineligible.
Regarding claim 14, the rejection of claim 13 is incorporated herein. The claim recites similar limitations as corresponding to claim 6. Therefore, the same subject matter analysis was utilized for claim 6, as described above, is equally applicable to claim 14.
Therefore claim 14, is ineligible.
Regarding claim 15, the rejection of claim 9 is incorporated herein. The claim recites similar limitations as corresponding to claim 7. Therefore, the same subject matter analysis was utilized for claim 7, as described above, is equally applicable to claim 15.
Therefore claim 15, is ineligible.
Regarding claim 16, the rejection of claim 9 is incorporated herein. The claim recites similar limitations as corresponding to claim 8. Therefore, the same subject matter analysis was utilized for claim 8, as described above, is equally applicable to claim 16.
Therefore claim 16, is ineligible.
Regarding claim 17, the following claim elements are abstract ideas:
determining whether the first data set meets or exceeds a low fraud tenant threshold; (This is an abstract idea of a “mental process.” The step of “determining whether the first data set meets or exceeds a low fraud tenant threshold” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person can review the data mentally or with pen and paper and decide whether it surpasses a certain threshold. As such, this limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).)
responsive to the determining, segmenting the first tenant data system into a tenant segment group based on the first data set; (This is an abstract idea of a “mental process.” The step of “segmenting the first tenant data system into a tenant segment group based on the first data set,” responsive to the determining step, can under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person can mentally or with pen and paper sort or group records based on the data to create segments. As such, the limitation qualifies as a mental process. See MPEP 2106.04(a)(2)(III).)
determining first features of a first ML model based on at least a portion of the first data set and an ML model algorithm for the first ML model; (This is an abstract idea of a “mental process.” The step of “determining first features of a first ML model based on at least a portion of the first data set and an ML model algorithm for the first ML model” can, under the broadest reasonable interpretation, be practically performed in the human mind using observation and judgement. For example, a person could review the data and, by following an algorithmic set of rules, identify relevant features for building a model, either mentally or with pen and paper. See MPEP 2106.04(a)(2)(III). Additionally, an algorithm, as recited here, is considered a mathematical concept because it represents a mathematical relationship, calculation, or set of rules for processing data, as explained in MPEP 2106.04(a)(2)(I).)
determining a first explanation of a first feature importance of each of the first features of the first ML model, wherein the first explanation comprises first weighted scores corresponding to the score for each of the first features (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.),
comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance associated with the first features and a second data set for the second tenant data system; (This is an abstract idea of a “mental process.” The step of “comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance of each of the second features for the second tenant data system” can, under the broadest reasonable interpretation, be practically be performed in the human mind using observation and judgement. For example, a person can review the first feature importance explanations for both systems – where a feature importance explanation simply describes which factors are most influential in making decisions – analyze similarities or differences, and make a comparison, either mentally or with pen and paper. As such, the limitation qualifies are a mental process. See MPEP 2106.04(a)(2)(III).)
ranking at least the first features based on the similarity scores (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.)
selecting a set of the first features for training a second ML model using feature selection for the least second ML model based on the ranking. (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.)
computing a score for each of the first features… to determine an amount that each of the first features contributes to outputs by the first ML model and compute the score based on the amount (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.);
converting the first and second weighted scores to vectors in an n-dimensional space (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.), and
computing similarity scores in the n-dimensional space using the vectors, wherein the similarity scores indicate how similar the first weighted scores are to the second weighted scores (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.),
determining that a threshold similarity exists between the first tenant data system and the second tenant data system based on the comparing (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.);
determining whether to iterate over the feature selection for inclusion of at least one additional feature in the set of the first features for retraining the second ML model using at least the ML model explainer process (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.).
The following claim elements are additional elements which, taken alone or in combination with the other elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
non-transitory computer-readable medium (The claimed elements represent generic computer components and computer instructions to apply the abstract idea. Therefore, these elements are sufficient to integrate the judicial exception. See MPEP 2106.05(f).) having stored thereon computer-readable instructions executable to detect fraud in tenant data systems using a machine learning (ML) system, the computer-readable instructions executable to perform ML modeling operations)
receiving a first data set for a first tenant data system usable for training a first machine learning model to detect fraud in the tenant data systems (This limitation amounts to adding insignificant extra-solution activity to a judicial exception, as discussed in MPEP 2106.05(g). Receiving a first data set (i.e., mere data gathering in conjunction with the abstract idea) is directed to a well understood routine conventional activity data transmission see 2106.05(d)(II)(i).);
wherein the second explanation comprises second weighted scores (This limitation is similar to a limitation in claim 1; therefore, the same subject matter analysis set forth above for claim 1 applies.),
training the second ML model based on the set of the first features (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).); and
using an ML model explainer process, wherein the ML model explainer process is configured to (This limitation constitutes mere instructions to apply the abstract idea and insignificant extra-solution activity. See MPEP 2106.05(f) and 2106.05(g).)
Regarding claim 18, the rejection of claim 17 is incorporated herein. The claim recites similar limitations as corresponding to claim 2. Therefore, the same subject matter analysis was utilized for claim 2, as described above, is equally applicable to claim 18.
Therefore claim 18, is ineligible.
Regarding claim 19, the rejection of claim 18 is incorporated herein. The claim recites similar limitations as corresponding to claim 3. Therefore, the same subject matter analysis was utilized for claim 3, as described above, is equally applicable to claim 19.
Therefore claim 19, is ineligible.
Regarding claim 20, the rejection of claim 17 is incorporated herein. The claim recites similar limitations as corresponding to claim 4. Therefore, the same subject matter analysis was utilized for claim 4, as described above, is equally applicable to claim 20.
Therefore claim 20, is ineligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 7-11, and 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Shachar et. al. (Pub. No.: US 20210342847, Filed: 2020) in view of Balakrishnan et al. (Pat. No.: US 11443273 B2, Filed: 2020).
Regarding claim 1, Shachar discloses:
A machine learning (ML) system configured to detect fraud in tenant data systems, the ML system comprising: a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform ML modeling operations (abstract mentions “ a processor and a computer readable medium operably coupled thereto, the computer readable medium comprising a plurality of instructions stored in association therewith that are accessible to, and executable by, the processor, to perform modeling operations”) which comprise:
receiving a first data set for a first tenant data system usable for training a first machine learning model to detect fraud in the tenant data systems (paragraph [0015] mentions “In order to provide for anomaly detection in data sets for an entity, such as money laundering, fraud, or noncompliant transactions in transaction data sets for a financial entity, an artificial intelligence (AI) system may first require micromodels trained on other data sets and/or using different supervised machine learning (ML) algorithms and techniques… The AI system may begin by accessing or receiving one or more first data sets that correspond to transactions for a first entity having labeled transactions. The first entity may be the same as, or different than, the entity having the second transaction data set used for modeling the ML model for anomaly detection.”) ;
determining whether the first data set meets or exceeds a low fraud tenant threshold (paragraph [0050] mentions “Using the first transaction data set, pre-processing may be applied to reduce the dimensionality of the first transaction data set and allow proper processing by an ML algorithm to generate a model…In this regard, the first transaction data set is sampled during pre-processing so that a sufficient number (e.g., all or a significant portion) of anomalous transactions are selected with a small portion of the non-anomalous transactions.” [0051] further mentions “ However, in order to provide better training of the ML model using the selected algorithm, risk scores for the second transaction data set are determined, at step 508, for example, using the micromodel previously trained from the first transaction data set.”);
responsive to the determining, segmenting the first tenant data system into a tenant segment group based on the first data set (paragraph [0036] mentions “In order to begin modeling operations on both data sets A and B, data pre-processing steps occur at blocks 3 and 4. For example, at block 3, steps of data cleaning, sampling, normalizing, determining intersecting columns between data sets A and B, and feature engineering may occur…Segment-specific feature and row selection may be performed, for example, based on small to mid-sized enterprise (SME) knowledge.);
determining first features of a first ML model based on at least a portion of the first data set (paragraph [0036] mentions “In order to begin modeling operations on both data sets A and B, data pre-processing steps occur at blocks 3 and 4. For example, at block 3, steps of data cleaning, sampling, normalizing, determining intersecting columns between data sets A and B, and feature engineering may occur.”) and an ML model algorithm for the first ML model (paragraph [0041] mentions “When performing micromodel creation block 5, the models may be created using the pre-processed data set A….These micromodels may be trained using a supervised ML algorithm and technique, including gradient boosting techniques such as XGBoost.”);
determining a first explanation of the first feature importance of each of the first features of the first ML model, wherein the first explanation comprises first weighted scores corresponding to the score for each of the first features (paragraph [0017] mentions “ In order to determine the effectiveness and significance of data features for performing analysis and predictions by the two different ML models, a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML model” – under the broadest reasonable interpretation, a “first explanation” comprising “first weighted scores” encompasses representing a feature importance values as numerical scores corresponding to each feature. Shachar teaches determining a first explanation of feature importance using a machine learning explainer (e.g., SHAP), where a measure of importance is obtained for each feature. Shachar further teaches obtaining a total significance level of each feature, which corresponds to scores for each feature, and such numerical importance values correspond to weighted scores associated with each feature.);
comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance associated with the
first features and a second data set for the second tenant data system wherein the second explanation comprises second weighted scores: (paragraph [0053] mentions “Thus, at step 516, the first and second ML models are compared. An ML model explainer may be used to determine an added value of each feature to the ML models' classifications, such as a measure of importance of each feature in the classification tasks. This may be done using SHAP, LIME, or a lift ratio per each feature separately.” [0017] “a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models” – under BRI, a “tenant data system” encompasses a dataset associated with a respective entity, and an “explanation” comprising weighted scores encompasses feature importance values associated with a dataset. Accordingly, comparing the first tenant data system to a second tenant data system based on at least a first explanation and a second explanation encompasses comparing respective datasets (or models trained thereon) based on feature importance values determined for each dataset. Shachar teaches comparing first and second ML models, where a machine learning explainer (e.g., SHAP) is used to determine a measure of importance of each feature for each model. Shachar further teaches obtaining a total significance level of each feature for the two ML models, which corresponds to weighted scores for each feature and thus corresponds to the first explanation and the second explanation. Accordingly, Shachar teaches comparing a first tenant data system to a second tenant data system based on respective explanations comprising feature importance values (i.e., weighted scores) associated with each dataset.),
computing a score for each of the first features using an ML model explainer process, wherein the ML model explainer process is configured to determine an amount that each of the first features contributes to outputs by the first ML model and compute the score based on the amount (Shachar, paragraph [0017] “a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” – under the broadest reasonable interpretation, a “ML model explainer process” encompasses any process that determines a measure of importance of each feature, corresponding to determining an amount that each feature contributes to outputs of the ML model. Shachar further teaches obtaining a total significance level of each feature, which corresponds to computing a score for each feature based on the amount of contribution.);
selecting a set of the first features for training a second ML model using a feature selection for the second ML model based on the ranking (Shachar, claim 4 mentions “an importance ranking of each feature in each classification task of the first machine learning model and the second machine learning model; and averaging the importance ranking of each feature to each classification task to obtain the comparison.” [0043] “This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature.” [0044] “This allows for the aggregated SHAP scores or other information from block 9 to be visualized for the most important micromodels. This may be necessary as micromodel importance between different data sets may vary and thus, the most important micromodel(s) may vary and should be selected for a particular data set.” – under BRI, “selecting a set of the first features…based on ranking” encompasses selecting features based on their importance ranking for use in a machine learning model. Shachar teaches determining feature importance using SHAP or LIME and generating a total significance level (i.e., ranking importance) for each feature. Shachar further teaches selecting the most important elements for a particular dataset based on such importance determinations. Accordingly, selecting the most important features based on their importance (i.e., ranking) corresponds to selecting a set of features for use in training a machine learning model as recited in the claim.);
training the second ML model based on the set of the first features (Shachar, paragraph [0046] “a model 300 in FIG. 3 shows a micromodel trained based on feature data input to provide a risk score output, while a model 400 in FIG. 4 shows an ML model for trained for anomaly detection using an enriched data set.” – under BRI, a “set of first features” encompasses feature data used as input to a machine learning model. Shachar teaches training machine learning models using feature data as input, including training a model based on feature data input and training a second model using an enriched dataset derived from such features. Accordingly, training a machine learning model using feature data corresponds training a second ML model based on the set of the first features as recited in the claim.); and
determining whether to iterate over the feature selection for inclusion of at least one additional feature in the set of the first features for retraining the second ML model using at least the ML model explainer process (Shachar, paragraph [0043] “include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature. Where neither SHAP nor LIME are available for an unsupervised ML algorithm, a lift table based on a model forecast on the test data may be used to provide feature importance.” [0044] “This allows for the aggregated SHAP scores or other information from block 9 to be visualized for the most important micromodels. This may be necessary as micromodel importance between different data sets may vary and thus, the most important micromodel(s) may vary and should be selected for a particular data set.” [0048] “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification.” – under BRI, “determining whether to iterate over the feature selection” encompasses determining, based on explainer-derived feature importance and model performance, whether to update a selected feature set and retrain a model. Shachar teaches evaluating feature importance using a machine learning explainer, selecting features or micromodels based on such importance, and continuously adjusting and retraining models using different training data to improve performance. Accordingly, Shachar’s workflow of evaluating feature importance, selecting features, and retraining models corresponds to determining whether to iterate feature selection for retraining as recited in the claim.).
However, Shachar does not teach but Shachar in view of Balakrishnan teaches the following limitations:
converting the first and second weighted scores to vectors in an n-dimensional space (Shachar, paragraph [0017] “ SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” Balakrishnan [col. 5, lines 61-66] “Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score” – under BRI, “weighted scores” encompasses numerical feature importance values, such as the significance levels of features taught by Shachar. Further, under the broadest reasonable interpretation, a “vectorized representation of weights” corresponds to a vector comprising multiple numerical values, which corresponds to a respective feature or attribute. A vector of multiple values inherently defines a multi-dimensional (n-dimensional) space, with each value representing a separate dimension. Accordingly, Balakrishnan’s vectorized representation corresponds to representing data in an n-dimensional space. Therefore, it would have been obvious to one of ordinary skill in the art to convert the feature importance values (i.e., weighted scores) generated by Shachar into vector representations in an n-dimensional space as taught by Balakrishnan in order to enable similarity-based comparison of such values.)
computing similarity scores in the n-dimensional space using the vectors, wherein the similarity scores indicate how similar the first weighted scores are to the second weighted scores (Balakrishnan, [col. 5, lines 44-67] “Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly… Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score between two products.” – under BRI, computing similarity scores using vectors encompasses determining a similarity measure between sets of numerical values. Balakrishnan explicitly teaches representing data as vectors of weights and computing similarity scores using operations such as a dot product or cosine similarity. Further, as discussed above, a vector comprising multiple values corresponds to an n-dimensional space, where each value represents a dimension. Operations such as dot product and cosine similarity are performed over these vectors and therefore inherently occur in that n-dimensional space. Additionally, the computed similarity score reflects how similar the respective vectors are, which corresponds to how similar the underlying weighted values are. Accordingly, Balakrishnan teaches computing similarity scores in n-dimensional space using vectors, where the similarity scores indicate how similar one set of weighted values is to another.);
determining that a threshold similarity exists between the first tenant data system and the second tenant data system based on the comparing (Balakrishnan, [col. 5, lines 32-41] “The Predictor ranks historical product data records in historical shipment information that satisfy a similarity threshold with the product data record (Act 216-1)… The Predictor selects historical product data records that meet a threshold for an amount of product data that matches the augmented product data record…” – under BRI, “determining that a threshold similarity exists” encompasses determining that a similarity score between two sets of data meets or exceeds a predefined threshold. Further, under the broadest reasonable interpretation, “tenant data systems” encompasses respective data sets or data representations associated with different entities. Balakrishnan teaches computing similarity scores between data records and selecting those that satisfy a similarity threshold, which corresponds to determining that the similarity between two data representations meets a threshold. Accordingly, Balakrishnan teaches determining a threshold similarity exists between a first and second data system based on the similarity comparison.);
ranking at least the first features based on the similarity scores (Balakrishnan, [col. 5, lines 32-47] “The Predictor ranks historical product data records in historical shipment information that satisfy a similarity threshold with the product data record (Act 216-1)…Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly.” – under the broadest reasonable interpretation, “features” encompasses attributes or characteristics of data, and “ranking…based on similarity scores” encompasses ordering feature-based representations according to computed similarity values. Balakrishnan teaches calculating similarity scores for data records based on their attributes and ranking those records according to the similarity scores. Accordingly, ranking data records based on similarity scores corresponds to ranking feature-based representations based on similarity scores as recited in the claim.);
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Shachar and Balakrishnan before them, to incorporate the similarity-based vector comparison techniques of Balakrishnan into the feature importance and model explanation of framework of Shachar. One would have been motivated to make such a combination in order to enable comparison of feature importance values across different data sets or systems using quantitative similarity measures, such as vector-based similarity scores. This would allow more effective evaluation and transfer of feature relevance by identifying similarities between feature importance representations, thereby improving feature selection and model training across different data environments.
Regarding claim 2, Shachar in view of Balakrishnan teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented in claim 1. Shachar in view of Balakrishnan further teaches the following limitation:
wherein the first weighted scores and the second weight scores comprise SHapley Additive exPlanations (SHAP) values (Shapley values) generated from a SHAP algorithm (paragraph [0017] mentions “In order to determine the effectiveness and significance of data features for performing analysis and predictions by the two different ML models, a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” – under BRI, “weighted scores” encompass numerical feature importance values associated with respective features. Shachar explicitly teaches using SHAP, which generates Shapley values representing the contribution of each feature to a model output. Such Shapley values are numerical values corresponding to feature importance and thus corresponds to weighted scores. Accordingly, Shachar teaches that the weight scores comprise SHAP values generated from a SHAP algorithm as recited in the claim.).
Regarding claim 3, Shachar in view of Balakrishnan teaches all the elements of claim 2, therefore is rejected for the same reasons as those presented for claim 2. Shachar in view of Balakrishnan further teaches the following limitation:
wherein the determining the first explanation of the first feature importance of each of the first features comprises: determining, using the score for each of the first features from an overall set of features associated with the first ML model the Shapley values of the first features to the first ML model based on (paragraph [0043] mentions “This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature.” – under BRI, a “score for each of the first features” encompasses a numerical measure of feature importance, such as contribution (i.e., Shapley value) for each feature and further determining an average of those contributions across an overall set of features to obtain a total significance level. Accordingly, determining Shapley values of features based on their contribution across an overall set of features corresponds to determining, using the score for each feature, the Shapley values of the features to the machine learning model as recited in the claim.)
Regarding claim 7, Shachar in view of Balakrishnan teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1. Shachar in view of Balakrishnan further teaches:
wherein prior to determining the first features of the first ML model, the ML modeling operations further comprise: determining a training data set and a testing data set for the first ML model based on the first data set wherein the first data set is separate from the second data set (Shachar, paragraph [0037] mentions “sampling of the training and validation sets (e.g., not the test set) is conducted…” paragraph [0051] mentions “In order to perform federated transfer learning, this second transaction data set may be separated and segregated from the first transaction data set…” – under BRI, “determining a training data set and a testing data set…based on the first data set” encompasses partitioning a dataset into subsets used for training and testing a machine learning model. Shachar teaches sampling training and validation data sets from data, which corresponds to determining training and testing datasets based on an initial dataset. Shachar further teaches that a second dataset is separate and separated from a first dataset, which corresponds to the first data set being separate from the second data set as recited in the claim);
training the first ML model using the training data set (paragraph [0037 mentions “sampling of the training and validation sets (e.g., not the test set) is conducted…”) ; and testing the first ML model using the testing data set (Paragraph [0043] mentions “The models may be trained using various types of ML algorithms, including unsupervised ML algorithms, for example, extended isolation forest, variational out encoder, and/or one-class SVM. When training, validating, and testing the ML model, as well as optimizing hyper-parameters, the ML model is built and trained using a validation data set and the ML model's hyper-parameters are optimized during that model building. After the ML model is trained and the hyper-parameters are optimized, the ML model may then be tested... Where neither SHAP nor LIME are available for an unsupervised ML algorithm, a lift table based on a model forecast on the test data may be used to provide feature importance.”)
Regarding claim 8, Shachar in view of Balakrishnan teaches all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1. Shachar in view of Balakrishnan further teaches:
wherein computing the similarity score utilizes a cosine similarity between the first and second vectors generated (Balakrishnan, [col. 5, lines 61-67] “Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score between two products.” – under BRI, utilizing cosine similarity encompasses determining a similarity measure between two vector representations using a cosine-based metric. Balakrishnan teaches computing a similarity score between those vectors using a dog product, which is referred to as cosine similarity. Accordingly, Balakrishnan teaches computing a similarity score using cosine similarity between vector representations.).
Regarding claim 9, Shachar discloses:
A method to detect fraud in tenant data systems by a machine learning (ML) system, the method comprising: receiving a first data set for a first tenant data system usable for training a first machine learning model to detect fraud in the tenant data systems (paragraph [0015] mentions “In order to provide for anomaly detection in data sets for an entity, such as money laundering, fraud, or noncompliant transactions in transaction data sets for a financial entity, an artificial intelligence (AI) system may first require micromodels trained on other data sets and/or using different supervised machine learning (ML) algorithms and techniques… The AI system may begin by accessing or receiving one or more first data sets that correspond to transactions for a first entity having labeled transactions. The first entity may be the same as, or different than, the entity having the second transaction data set used for modeling the ML model for anomaly detection.”) ;
determining whether the first data set meets or exceeds a low fraud tenant threshold (paragraph [0050] mentions “Using the first transaction data set, pre-processing may be applied to reduce the dimensionality of the first transaction data set and allow proper processing by an ML algorithm to generate a model…In this regard, the first transaction data set is sampled during pre-processing so that a sufficient number (e.g., all or a significant portion) of anomalous transactions are selected with a small portion of the non-anomalous transactions.” [0051] further mentions “ However, in order to provide better training of the ML model using the selected algorithm, risk scores for the second transaction data set are determined, at step 508, for example, using the micromodel previously trained from the first transaction data set.”);
responsive to the determining, segmenting the first tenant data system into a tenant segment group based on the first data set (paragraph [0036] mentions “In order to begin modeling operations on both data sets A and B, data pre-processing steps occur at blocks 3 and 4. For example, at block 3, steps of data cleaning, sampling, normalizing, determining intersecting columns between data sets A and B, and feature engineering may occur…Segment-specific feature and row selection may be performed, for example, based on small to mid-sized enterprise (SME) knowledge.);
determining first features of a first ML model based on at least a portion of the first data set (paragraph [0036] mentions “In order to begin modeling operations on both data sets A and B, data pre-processing steps occur at blocks 3 and 4. For example, at block 3, steps of data cleaning, sampling, normalizing, determining intersecting columns between data sets A and B, and feature engineering may occur.”) and an ML model algorithm for the first ML model (paragraph [0041] mentions “When performing micromodel creation block 5, the models may be created using the pre-processed data set A….These micromodels may be trained using a supervised ML algorithm and technique, including gradient boosting techniques such as XGBoost.”);
determining a first explanation of the first feature importance of each of the first features of the first ML model, wherein the first explanation comprises first weighted scores corresponding to the score for each of the first features (paragraph [0017] mentions “ In order to determine the effectiveness and significance of data features for performing analysis and predictions by the two different ML models, a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML model” – under the broadest reasonable interpretation, a “first explanation” comprising “first weighted scores” encompasses representing a feature importance values as numerical scores corresponding to each feature. Shachar teaches determining a first explanation of feature importance using a machine learning explainer (e.g., SHAP), where a measure of importance is obtained for each feature. Shachar further teaches obtaining a total significance level of each feature, which corresponds to scores for each feature, and such numerical importance values correspond to weighted scores associated with each feature.);
comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance associated with the
first features and a second data set for the second tenant data system wherein the second explanation comprises second weighted scores: (paragraph [0053] mentions “Thus, at step 516, the first and second ML models are compared. An ML model explainer may be used to determine an added value of each feature to the ML models' classifications, such as a measure of importance of each feature in the classification tasks. This may be done using SHAP, LIME, or a lift ratio per each feature separately.” [0017] “a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models” – under BRI, a “tenant data system” encompasses a dataset associated with a respective entity, and an “explanation” comprising weighted scores encompasses feature importance values associated with a dataset. Accordingly, comparing the first tenant data system to a second tenant data system based on at least a first explanation and a second explanation encompasses comparing respective datasets (or models trained thereon) based on feature importance values determined for each dataset. Shachar teaches comparing first and second ML models, where a machine learning explainer (e.g., SHAP) is used to determine a measure of importance of each feature for each model. Shachar further teaches obtaining a total significance level of each feature for the two ML models, which corresponds to weighted scores for each feature and thus corresponds to the first explanation and the second explanation. Accordingly, Shachar teaches comparing a first tenant data system to a second tenant data system based on respective explanations comprising feature importance values (i.e., weighted scores) associated with each dataset.),
computing a score for each of the first features using an ML model explainer process, wherein the ML model explainer process is configured to determine an amount that each of the first features contributes to outputs by the first ML model and compute the score based on the amount (Shachar, paragraph [0017] “a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” – under the broadest reasonable interpretation, a “ML model explainer process” encompasses any process that determines a measure of importance of each feature, corresponding to determining an amount that each feature contributes to outputs of the ML model. Shachar further teaches obtaining a total significance level of each feature, which corresponds to computing a score for each feature based on the amount of contribution.);
selecting a set of the first features for training a second ML model using a feature selection for the second ML model based on the ranking (Shachar, claim 4 mentions “an importance ranking of each feature in each classification task of the first machine learning model and the second machine learning model; and averaging the importance ranking of each feature to each classification task to obtain the comparison.” [0043] “This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature.” [0044] “This allows for the aggregated SHAP scores or other information from block 9 to be visualized for the most important micromodels. This may be necessary as micromodel importance between different data sets may vary and thus, the most important micromodel(s) may vary and should be selected for a particular data set.” – under BRI, “selecting a set of the first features…based on ranking” encompasses selecting features based on their importance ranking for use in a machine learning model. Shachar teaches determining feature importance using SHAP or LIME and generating a total significance level (i.e., ranking importance) for each feature. Shachar further teaches selecting the most important elements for a particular dataset based on such importance determinations. Accordingly, selecting the most important features based on their importance (i.e., ranking) corresponds to selecting a set of features for use in training a machine learning model as recited in the claim.);
training the second ML model based on the set of the first features (Shachar, paragraph [0046] “a model 300 in FIG. 3 shows a micromodel trained based on feature data input to provide a risk score output, while a model 400 in FIG. 4 shows an ML model for trained for anomaly detection using an enriched data set.” – under BRI, a “set of first features” encompasses feature data used as input to a machine learning model. Shachar teaches training machine learning models using feature data as input, including training a model based on feature data input and training a second model using an enriched dataset derived from such features. Accordingly, training a machine learning model using feature data corresponds training a second ML model based on the set of the first features as recited in the claim.); and
determining whether to iterate over the feature selection for inclusion of at least one additional feature in the set of the first features for retraining the second ML model using at least the ML model explainer process (Shachar, paragraph [0043] “include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature. Where neither SHAP nor LIME are available for an unsupervised ML algorithm, a lift table based on a model forecast on the test data may be used to provide feature importance.” [0044] “This allows for the aggregated SHAP scores or other information from block 9 to be visualized for the most important micromodels. This may be necessary as micromodel importance between different data sets may vary and thus, the most important micromodel(s) may vary and should be selected for a particular data set.” [0048] “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification.” – under BRI, “determining whether to iterate over the feature selection” encompasses determining, based on explainer-derived feature importance and model performance, whether to update a selected feature set and retrain a model. Shachar teaches evaluating feature importance using a machine learning explainer, selecting features or micromodels based on such importance, and continuously adjusting and retraining models using different training data to improve performance. Accordingly, Shachar’s workflow of evaluating feature importance, selecting features, and retraining models corresponds to determining whether to iterate feature selection for retraining as recited in the claim.).
However, Shachar does not teach but Shachar in view of Balakrishnan teaches the following limitations:
converting the first and second weighted scores to vectors in an n-dimensional space (Shachar, paragraph [0017] “ SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” Balakrishnan [col. 5, lines 61-66] “Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score” – under BRI, “weighted scores” encompasses numerical feature importance values, such as the significance levels of features taught by Shachar. Further, under the broadest reasonable interpretation, a “vectorized representation of weights” corresponds to a vector comprising multiple numerical values, which corresponds to a respective feature or attribute. A vector of multiple values inherently defines a multi-dimensional (n-dimensional) space, with each value representing a separate dimension. Accordingly, Balakrishnan’s vectorized representation corresponds to representing data in an n-dimensional space. Therefore, it would have been obvious to one of ordinary skill in the art to convert the feature importance values (i.e., weighted scores) generated by Shachar into vector representations in an n-dimensional space as taught by Balakrishnan in order to enable similarity-based comparison of such values.)
computing similarity scores in the n-dimensional space using the vectors, wherein the similarity scores indicate how similar the first weighted scores are to the second weighted scores (Balakrishnan, [col. 5, lines 44-67] “Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly… Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score between two products.” – under BRI, computing similarity scores using vectors encompasses determining a similarity measure between sets of numerical values. Balakrishnan explicitly teaches representing data as vectors of weights and computing similarity scores using operations such as a dot product or cosine similarity. Further, as discussed above, a vector comprising multiple values corresponds to an n-dimensional space, where each value represents a dimension. Operations such as dot product and cosine similarity are performed over these vectors and therefore inherently occur in that n-dimensional space. Additionally, the computed similarity score reflects how similar the respective vectors are, which corresponds to how similar the underlying weighted values are. Accordingly, Balakrishnan teaches computing similarity scores in n-dimensional space using vectors, where the similarity scores indicate how similar one set of weighted values is to another.);
determining that a threshold similarity exists between the first tenant data system and the second tenant data system based on the comparing (Balakrishnan, [col. 5, lines 32-41] “The Predictor ranks historical product data records in historical shipment information that satisfy a similarity threshold with the product data record (Act 216-1)… The Predictor selects historical product data records that meet a threshold for an amount of product data that matches the augmented product data record…” – under BRI, “determining that a threshold similarity exists” encompasses determining that a similarity score between two sets of data meets or exceeds a predefined threshold. Further, under the broadest reasonable interpretation, “tenant data systems” encompasses respective data sets or data representations associated with different entities. Balakrishnan teaches computing similarity scores between data records and selecting those that satisfy a similarity threshold, which corresponds to determining that the similarity between two data representations meets a threshold. Accordingly, Balakrishnan teaches determining a threshold similarity exists between a first and second data system based on the similarity comparison.);
ranking at least the first features based on the similarity scores (Balakrishnan, [col. 5, lines 32-47] “The Predictor ranks historical product data records in historical shipment information that satisfy a similarity threshold with the product data record (Act 216-1)…Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly.” – under the broadest reasonable interpretation, “features” encompasses attributes or characteristics of data, and “ranking…based on similarity scores” encompasses ordering feature-based representations according to computed similarity values. Balakrishnan teaches calculating similarity scores for data records based on their attributes and ranking those records according to the similarity scores. Accordingly, ranking data records based on similarity scores corresponds to ranking feature-based representations based on similarity scores as recited in the claim.);
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Shachar and Balakrishnan before them, to incorporate the similarity-based vector comparison techniques of Balakrishnan into the feature importance and model explanation of framework of Shachar. One would have been motivated to make such a combination in order to enable comparison of feature importance values across different data sets or systems using quantitative similarity measures, such as vector-based similarity scores. This would allow more effective evaluation and transfer of feature relevance by identifying similarities between feature importance representations, thereby improving feature selection and model training across different data environments.
Regarding claim 10, Shachar in view of Balakrishnan teaches all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9. The claim recites similar limitations corresponding to claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding claim 11, Shachar in view of Balakrishnan teaches all the elements of claim 10, therefore is rejected for the same reasons as those presented for claim 10. The claim recites similar limitations corresponding to claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Regarding claim 15, Shachar in view of Balakrishnan teaches all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9. The claim recites similar limitations corresponding to claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Regarding claim 16, Shachar in view of Balakrishnan teaches all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9. The claim recites similar limitations corresponding to claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale.
Regarding claim 17, Shachar discloses:
A non-transitory computer-readable medium having stored (paragraph [0049] mentions “ One or more of the processes 502-516 of method 500 may be implemented, at least in part, in the form of executable code stored on non-transitory, tangible, machine-readable media that when run by one or more processors may cause the one or more processors to perform one or more of the processes 502-514. “)thereon computer-readable instructions executable to detect fraud in tenant data systems using a machine learning (ML) system, the computer-readable instructions executable to perform ML modeling operations which comprises:
receiving a first data set for a first tenant data system usable for training a first machine learning model to detect fraud in the tenant data systems (paragraph [0015] mentions “In order to provide for anomaly detection in data sets for an entity, such as money laundering, fraud, or noncompliant transactions in transaction data sets for a financial entity, an artificial intelligence (AI) system may first require micromodels trained on other data sets and/or using different supervised machine learning (ML) algorithms and techniques… The AI system may begin by accessing or receiving one or more first data sets that correspond to transactions for a first entity having labeled transactions. The first entity may be the same as, or different than, the entity having the second transaction data set used for modeling the ML model for anomaly detection.”) ;
determining whether the first data set meets or exceeds a low fraud tenant threshold (paragraph [0050] mentions “Using the first transaction data set, pre-processing may be applied to reduce the dimensionality of the first transaction data set and allow proper processing by an ML algorithm to generate a model…In this regard, the first transaction data set is sampled during pre-processing so that a sufficient number (e.g., all or a significant portion) of anomalous transactions are selected with a small portion of the non-anomalous transactions.” [0051] further mentions “ However, in order to provide better training of the ML model using the selected algorithm, risk scores for the second transaction data set are determined, at step 508, for example, using the micromodel previously trained from the first transaction data set.”);
responsive to the determining, segmenting the first tenant data system into a tenant segment group based on the first data set (paragraph [0036] mentions “In order to begin modeling operations on both data sets A and B, data pre-processing steps occur at blocks 3 and 4. For example, at block 3, steps of data cleaning, sampling, normalizing, determining intersecting columns between data sets A and B, and feature engineering may occur…Segment-specific feature and row selection may be performed, for example, based on small to mid-sized enterprise (SME) knowledge.);
determining first features of a first ML model based on at least a portion of the first data set (paragraph [0036] mentions “In order to begin modeling operations on both data sets A and B, data pre-processing steps occur at blocks 3 and 4. For example, at block 3, steps of data cleaning, sampling, normalizing, determining intersecting columns between data sets A and B, and feature engineering may occur.”) and an ML model algorithm for the first ML model (paragraph [0041] mentions “When performing micromodel creation block 5, the models may be created using the pre-processed data set A….These micromodels may be trained using a supervised ML algorithm and technique, including gradient boosting techniques such as XGBoost.”);
determining a first explanation of the first feature importance of each of the first features of the first ML model, wherein the first explanation comprises first weighted scores corresponding to the score for each of the first features (paragraph [0017] mentions “ In order to determine the effectiveness and significance of data features for performing analysis and predictions by the two different ML models, a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML model” – under the broadest reasonable interpretation, a “first explanation” comprising “first weighted scores” encompasses representing a feature importance values as numerical scores corresponding to each feature. Shachar teaches determining a first explanation of feature importance using a machine learning explainer (e.g., SHAP), where a measure of importance is obtained for each feature. Shachar further teaches obtaining a total significance level of each feature, which corresponds to scores for each feature, and such numerical importance values correspond to weighted scores associated with each feature.);
comparing the first tenant data system to a second tenant data system based on at least the first explanation and a second explanation of a second feature importance associated with the
first features and a second data set for the second tenant data system wherein the second explanation comprises second weighted scores: (paragraph [0053] mentions “Thus, at step 516, the first and second ML models are compared. An ML model explainer may be used to determine an added value of each feature to the ML models' classifications, such as a measure of importance of each feature in the classification tasks. This may be done using SHAP, LIME, or a lift ratio per each feature separately.” [0017] “a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models” – under BRI, a “tenant data system” encompasses a dataset associated with a respective entity, and an “explanation” comprising weighted scores encompasses feature importance values associated with a dataset. Accordingly, comparing the first tenant data system to a second tenant data system based on at least a first explanation and a second explanation encompasses comparing respective datasets (or models trained thereon) based on feature importance values determined for each dataset. Shachar teaches comparing first and second ML models, where a machine learning explainer (e.g., SHAP) is used to determine a measure of importance of each feature for each model. Shachar further teaches obtaining a total significance level of each feature for the two ML models, which corresponds to weighted scores for each feature and thus corresponds to the first explanation and the second explanation. Accordingly, Shachar teaches comparing a first tenant data system to a second tenant data system based on respective explanations comprising feature importance values (i.e., weighted scores) associated with each dataset.),
computing a score for each of the first features using an ML model explainer process, wherein the ML model explainer process is configured to determine an amount that each of the first features contributes to outputs by the first ML model and compute the score based on the amount (Shachar, paragraph [0017] “a model explanation and comparison may be performed on the two models using a machine learning explainer. SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” – under the broadest reasonable interpretation, a “ML model explainer process” encompasses any process that determines a measure of importance of each feature, corresponding to determining an amount that each feature contributes to outputs of the ML model. Shachar further teaches obtaining a total significance level of each feature, which corresponds to computing a score for each feature based on the amount of contribution.);
selecting a set of the first features for training a second ML model using a feature selection for the second ML model based on the ranking (Shachar, claim 4 mentions “an importance ranking of each feature in each classification task of the first machine learning model and the second machine learning model; and averaging the importance ranking of each feature to each classification task to obtain the comparison.” [0043] “This may include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature.” [0044] “This allows for the aggregated SHAP scores or other information from block 9 to be visualized for the most important micromodels. This may be necessary as micromodel importance between different data sets may vary and thus, the most important micromodel(s) may vary and should be selected for a particular data set.” – under BRI, “selecting a set of the first features…based on ranking” encompasses selecting features based on their importance ranking for use in a machine learning model. Shachar teaches determining feature importance using SHAP or LIME and generating a total significance level (i.e., ranking importance) for each feature. Shachar further teaches selecting the most important elements for a particular dataset based on such importance determinations. Accordingly, selecting the most important features based on their importance (i.e., ranking) corresponds to selecting a set of features for use in training a machine learning model as recited in the claim.);
training the second ML model based on the set of the first features (Shachar, paragraph [0046] “a model 300 in FIG. 3 shows a micromodel trained based on feature data input to provide a risk score output, while a model 400 in FIG. 4 shows an ML model for trained for anomaly detection using an enriched data set.” – under BRI, a “set of first features” encompasses feature data used as input to a machine learning model. Shachar teaches training machine learning models using feature data as input, including training a model based on feature data input and training a second model using an enriched dataset derived from such features. Accordingly, training a machine learning model using feature data corresponds training a second ML model based on the set of the first features as recited in the claim.); and
determining whether to iterate over the feature selection for inclusion of at least one additional feature in the set of the first features for retraining the second ML model using at least the ML model explainer process (Shachar, paragraph [0043] “include utilizing SHAP or LIME to obtain a measure of importance of each feature in each classification task. Thereafter, an average of those contributions is determined to obtain a total significance level of each feature. Where neither SHAP nor LIME are available for an unsupervised ML algorithm, a lift table based on a model forecast on the test data may be used to provide feature importance.” [0044] “This allows for the aggregated SHAP scores or other information from block 9 to be visualized for the most important micromodels. This may be necessary as micromodel importance between different data sets may vary and thus, the most important micromodel(s) may vary and should be selected for a particular data set.” [0048] “By continuously providing different sets of training data and penalizing models 300 and 400 when the output is incorrect, models 300 and 400 (and specifically, the representations of the nodes in the hidden layer) may be trained (adjusted) to improve performance of models 300, 400 in data classification.” – under BRI, “determining whether to iterate over the feature selection” encompasses determining, based on explainer-derived feature importance and model performance, whether to update a selected feature set and retrain a model. Shachar teaches evaluating feature importance using a machine learning explainer, selecting features or micromodels based on such importance, and continuously adjusting and retraining models using different training data to improve performance. Accordingly, Shachar’s workflow of evaluating feature importance, selecting features, and retraining models corresponds to determining whether to iterate feature selection for retraining as recited in the claim.).
However, Shachar does not teach but Shachar in view of Balakrishnan teaches the following limitations:
converting the first and second weighted scores to vectors in an n-dimensional space (Shachar, paragraph [0017] “ SHapley Additive exPlanations (SHAP) and/or Local Interpretable Model-agnostic Explanations (LIME) may be used to obtain a measure of importance of each feature in the classification task of determining whether a transaction is anomalous or not (e.g., flagged or classified as money laundering, fraud, or noncompliant, or otherwise valid). The AI system then obtains a total significance level of each feature for the two ML models and may output the significance level with the ML models.” Balakrishnan [col. 5, lines 61-66] “Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score” – under BRI, “weighted scores” encompasses numerical feature importance values, such as the significance levels of features taught by Shachar. Further, under the broadest reasonable interpretation, a “vectorized representation of weights” corresponds to a vector comprising multiple numerical values, which corresponds to a respective feature or attribute. A vector of multiple values inherently defines a multi-dimensional (n-dimensional) space, with each value representing a separate dimension. Accordingly, Balakrishnan’s vectorized representation corresponds to representing data in an n-dimensional space. Therefore, it would have been obvious to one of ordinary skill in the art to convert the feature importance values (i.e., weighted scores) generated by Shachar into vector representations in an n-dimensional space as taught by Balakrishnan in order to enable similarity-based comparison of such values.)
computing similarity scores in the n-dimensional space using the vectors, wherein the similarity scores indicate how similar the first weighted scores are to the second weighted scores (Balakrishnan, [col. 5, lines 44-67] “Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly… Another approach to calculate the similarity score between two products is to use a machine learning model(s) to convert each product to a vectorized representation of weights. By representing each product as a vector, the dot product between the products can be used as a similarity score between them—also referred to as the cosine similarity score between two products.” – under BRI, computing similarity scores using vectors encompasses determining a similarity measure between sets of numerical values. Balakrishnan explicitly teaches representing data as vectors of weights and computing similarity scores using operations such as a dot product or cosine similarity. Further, as discussed above, a vector comprising multiple values corresponds to an n-dimensional space, where each value represents a dimension. Operations such as dot product and cosine similarity are performed over these vectors and therefore inherently occur in that n-dimensional space. Additionally, the computed similarity score reflects how similar the respective vectors are, which corresponds to how similar the underlying weighted values are. Accordingly, Balakrishnan teaches computing similarity scores in n-dimensional space using vectors, where the similarity scores indicate how similar one set of weighted values is to another.);
determining that a threshold similarity exists between the first tenant data system and the second tenant data system based on the comparing (Balakrishnan, [col. 5, lines 32-41] “The Predictor ranks historical product data records in historical shipment information that satisfy a similarity threshold with the product data record (Act 216-1)… The Predictor selects historical product data records that meet a threshold for an amount of product data that matches the augmented product data record…” – under BRI, “determining that a threshold similarity exists” encompasses determining that a similarity score between two sets of data meets or exceeds a predefined threshold. Further, under the broadest reasonable interpretation, “tenant data systems” encompasses respective data sets or data representations associated with different entities. Balakrishnan teaches computing similarity scores between data records and selecting those that satisfy a similarity threshold, which corresponds to determining that the similarity between two data representations meets a threshold. Accordingly, Balakrishnan teaches determining a threshold similarity exists between a first and second data system based on the similarity comparison.);
ranking at least the first features based on the similarity scores (Balakrishnan, [col. 5, lines 32-47] “The Predictor ranks historical product data records in historical shipment information that satisfy a similarity threshold with the product data record (Act 216-1)…Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly.” – under the broadest reasonable interpretation, “features” encompasses attributes or characteristics of data, and “ranking…based on similarity scores” encompasses ordering feature-based representations according to computed similarity values. Balakrishnan teaches calculating similarity scores for data records based on their attributes and ranking those records according to the similarity scores. Accordingly, ranking data records based on similarity scores corresponds to ranking feature-based representations based on similarity scores as recited in the claim.);
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having Shachar and Balakrishnan before them, to incorporate the similarity-based vector comparison techniques of Balakrishnan into the feature importance and model explanation of framework of Shachar. One would have been motivated to make such a combination in order to enable comparison of feature importance values across different data sets or systems using quantitative similarity measures, such as vector-based similarity scores. This would allow more effective evaluation and transfer of feature relevance by identifying similarities between feature importance representations, thereby improving feature selection and model training across different data environments.
Regarding claim 18, Shachar in view of Balakrishnan teaches all the elements of claim 17, therefore is rejected for the same reasons as those presented for claim 17. The claim recites similar limitations corresponding to claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding claim 19, Shachar in view of Balakrishnan teaches all the elements of claim 18, therefore is rejected for the same reasons as those presented for claim 18. The claim recites similar limitations corresponding to claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Claims 4, 12, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shachar et. al. (Pub. No.: US 20210342847, Filed: 2020) in view of Balakrishnan et al. (Pat. No.: US 11443273 B2, Filed: 2020) further in view of Archambeau et al. (Pat. No.: 11977836, Filed: 2021).
Regarding claim 4, Shachar in view of Balakrishnan teaches, as outlined above, all the elements of claim 1, therefore it is rejected for the same reasons as those presented for claim 1, mutatis mutandis. However, Shachar in view of Balakrishnan does not teach but Shachar in view of Balakrishnan further in view of Archambeau teaches the following limitation:
converting the first explanation and the second explanation to a global explanation standard utilized with a plurality of ML models including the first ML model and the second ML model (col. 23, lines 18-27 mentions “Global explanations of machine learning models may be provided, in some embodiments, according to feature attribution measurements 260. For example, global explanation of an ML model may be obtained by aggregating the Shapley values over multiple instances of data (e.g., multiple attributes, or multiple text token groups). Different ways of aggregation may be implemented, in various embodiments, such as the mean of absolute SHAP values for all instances, the median of SHAP values for all instances, and mean of squared SHAP values for all instances.”).
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Shachar, Balakrishnan and Archambeau before them, to obtain a global explanation for machine learning model predictions by aggregating individual (record-level) prediction influence scores. One would have been motivated to generate global explanations in order to provide insights into patterns or characteristics of the model’s behavior across multiple data records (col. 10, lines 50-58).
Regarding claim 12, Shachar in view of Balakrishnan teaches, as outlined above, all the elements of claim 9, therefore it is rejected for the same reasons as those presented for claim 9, the claim recites similar limitations corresponding to claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale.
Regarding claim 20, Shachar in view of Balakrishnan teaches, as outlined above, all the elements of claim 17, therefore it is rejected for the same reasons as those presented for claim 17, the claim recites similar limitations corresponding to claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale.
Claims 5, 6, 13, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Shachar et. al. (Pub. No.: US 20210342847, Filed: 2020) in view of Balakrishnan et al. (Pat. No.: US 11443273 B2, Filed: 2020) further in view of Poole (Pat. No.: 11373298, Filed: 2019).
Regarding claim 5, Shachar in view of Balakrishnan teaches, as outlined above, all the elements of claim 1, therefore it is rejected for the same reasons as those presented for claim 1, mutatis mutandis. However, Shachar in view of Balakrishnan does not teach but Shachar in view of Balakrishnan further in view of Poole teaches the following limitations:
determining a fraud count of transactional frauds (Poole, abstract mentions “receive from a user a selection of a first characteristic including positive and negative samples which are relevant variations significant to prediction of the at least one predicted output…perform positive supervision of the model using the first characteristic such that the training of the model is sensitive to the positive and negative samples of the first characteristic…”) in the first data set;
determining that the fraud count of the transactional frauds (Poole, abstract mentions “receive from a user a selection of a first characteristic including positive and negative samples which are relevant variations significant to prediction of the at least one predicted output…perform positive supervision of the model using the first characteristic such that the training of the model is sensitive to the positive and negative samples of the first characteristic…”) meets or exceeds the low fraud tenant threshold (Shachar, paragraph [0050] mentions “Using the first transaction data set, pre-processing may be applied to reduce the dimensionality of the first transaction data set and allow proper processing by an ML algorithm to generate a model…In this regard, the first transaction data set is sampled during pre-processing so that a sufficient number (e.g., all or a significant portion) of anomalous transactions are selected with a small portion of the non-anomalous transactions.” [0051] further mentions “ However, in order to provide better training of the ML model using the selected algorithm, risk scores for the second transaction data set are determined, at step 508, for example, using the micromodel previously trained from the first transaction data set.”);
determining that the first tenant data system is not associated with a low fraudulent financial institution (Shachar, paragraph [0036] mentions “When performing segment-specific feature and row selection, commercial features may be removed when focusing on retail transactions, non-monetary transactions may be removed, transaction performed via channels that are not relevant to the specific segment may be removed, other pre-processing may be performed on the data set based on the selected business segment, or any combination thereof.” [0037] further mentions “A sampling step may be performed to make sure, with low occurrence of fraud, money laundering, noncompliance, etc. in transaction data sets, that sufficient anomalous transactions are selected…Thus, to reduce imbalance, sampling of the training and validation sets (e.g., not the test set) is conducted where all or a significant portion of the fraudulent, money laundering, non-compliant, or otherwise anomalous transactions are selected with a small amount (e.g., a predefined threshold) of the valid or non-anomalous transactions.”) based on the determining that the fraud count of the transactional frauds (Poole, abstract mentions “receive from a user a selection of a first characteristic including positive and negative samples which are relevant variations significant to prediction of the at least one predicted output…perform positive supervision of the model using the first characteristic such that the training of the model is sensitive to the positive and negative samples of the first characteristic…”) meets or exceeds the low fraud tenant threshold (Shachar, paragraph [0050] mentions “Using the first transaction data set, pre-processing may be applied to reduce the dimensionality of the first transaction data set and allow proper processing by an ML algorithm to generate a model…In this regard, the first transaction data set is sampled during pre-processing so that a sufficient number (e.g., all or a significant portion) of anomalous transactions are selected with a small portion of the non-anomalous transactions.” [0051] further mentions “ However, in order to provide better training of the ML model using the selected algorithm, risk scores for the second transaction data set are determined, at step 508, for example, using the micromodel previously trained from the first transaction data set.”).
Accordingly, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, having the combination of Shachar and Poole before them, to represent the fraud account of transactional frauds in the first data set using the known technique of counting positive (fraudulent) and negative (non-fraudulent) samples within the data, as Poole teaches using positive and negative samples relevant to prediction for model training. One would have been motivated to make such a combination in order to accurately distinguish institutions with high or low fraud rates, as the balance of positive and negative samples in a conventional and effective method for identifying and assessing the prevalence of a target activity in prediction tasks (Poole, abstract).
Regarding claim 6, Shachar in view of Balakrishnan further in view of Poole teaches, as outlined above, all the elements of claim 5, therefore it is rejected for the same reasons as those presented for claim 5. Shachar further teaches the limitation:
performing at least one fraud enrichment operation on the first data set, wherein the at least one fraud enrichment operation causes one or more transactions in the first data set to convert from a non-fraudulent transaction to a fraudulent transaction (paragraph [0023] mentions “Additionally, multiple different types of ML algorithms may be used to generate different micromodels. Micromodels 121 are trained to have multiple hyper-parameter settings where instead of optimizing certain hyper-parameters that would be tailored to a data set (e.g., second transaction data sets 131), multiple micromodels are instead trained and selected based on the data set and scenario. These models are generated instead to provide risk scores on the data set at stake (e.g., second transaction data sets 131) for ML modeling for anomaly detection.” [0024] further mentions “ After generating of risk scores 113, these risk scores are used to create an enriched data set for second transaction data sets 131 used for ML modeling of an anomaly detection model 115. This provides federated transfer learning by training models with different federated data sets that are transferred to a “data set at stake” for ML modeling. However, prior to modeling, dimensionality reduction is required for the data sets selected from second transaction data sets 131 for modeling.”).
Regarding claim 13, Shachar in view of Balakrishnan teaches, as outlined above, all the elements of claim 9, therefore it is rejected for the same reasons as those presented for claim 9. The claim recites similar limitations corresponding to claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.
Regarding claim 14, Shachar in view of Balakrishnan further in view of Poole teaches, as outlined above, all the elements of claim 13, therefore, it is rejected for the same reasons as those presented for claim 13. The claim recites similar limitations corresponding to claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daravanh Phakousonh whose telephone number is (571)272-6324. The examiner can normally be reached Mon - Thurs 7 AM - 5 PM, Every other Friday 7 AM - 4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Daravanh Phakousonh/Examiner, Art Unit 2121
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121