Office Action Analysis: 18188150 — METHOD AND SYSTEM FOR INTERPRETING MACHINE LEARNING MODEL'S PREDICTION

Office Action

§103
Detailed Action
This action is in responsive to the application filed 03/22/2023, in which:
Claims 1, 6, and 11 are the independent claims.
Claims 1-11 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. IN202221016247, filed on 03/23/2022.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/22/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Objections
Claim 4 is objected to because of the following informalities: The Claim appears to recite a typographical error as the claim recites “… based on based on …”; which is repeating the terms “based on”. Appropriate correction is required.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 5-6, 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Robnik-Šikonja et al., “Explaining Classifications for Individual Instances” in view of Breiman et al., “Random Forests”.
Regarding Claim 1:
Robnik-Šikonja teaches:
A processor-implemented method for interpreting machine learning model’s prediction comprising: 
(Robnik-Šikonja, Page 9, Column 1, Paragraph 3, “… sources code … as well as the visualization module can be obtained from the authors”; Column 2, Paragraph 8, “We present an approach to explanation of predictions, which generates explanations of predictions for individual instances”; Abstract, “… Our method works for so called black box models such as support vector machines, neural networks, and … ensemble methods …” The method taught by Robnik-Šikonja is evaluated on utilizes software packages which implies a processor, memory, and CRM as they are inherent within a device/apparatus that uses the software library for machine learning. explainVis is a visualization module for the machine learning method to explain/demonstrate the learning task for explaining/interpreting predictions of the model; thus, interpreted by the examiner as a processor-implemented method for interpreting machine learning model’s prediction). 
receiving a plurality of input data parameters associated with a Machine Learning (ML) model, via one or more hardware processors, wherein the plurality of input data parameters includes a pre-defined threshold of the ML model, an input feature vector comprising a plurality of predictors, (Robnik-Šikonja, Page 1, Column 2, Paragraph 2, “… n learning instances is represented by an ordered pair (x, y); each vector of attribute values x consists of individual values of attributes Ai, i = 1, ..., a (a is the number of attributes) …”; Page 4, Column 2, Paragraph 5, “… we introduce a threshold parameter and only attributes with sufficient impact are displayed”; Page 2, Column 2, Footnote 1, “In the R system, which we used as our testing environment, the default behavior of many learning models is to fail when predicting an input with NA values …”.  The R system within this methodology receives input vector x for prediction, where the x input data vector contains A (attributes). A is interpreted by the examiner as input data parameters with a pre-defined threshold parameter of the ML model; thus, the examiner interprets the receiving of the input data vector to correspond to a plurality of input data parameters associated with the ML model where the input feature vector comprises a plurality of predictors) an original prediction (N) for each predictor from the plurality of predictors, a pre-determined duplication factor, and a plurality of pre-trained data statistics; (Robnik-Šikonja, Page 1, Column 1, Equation (1), “
    PNG
    media_image1.png
    30
    200
    media_image1.png
    Greyscale
”; Page 2, Column 2, Footnote 2, “… this method … assumes that we have access to the prior probabilities of the values …”; Page 3, Column 1, Paragraph 4, “… for a single instance … we need O(a) model evaluations (at least one prediction for each attribute) …”. Equation (1) shows the basic form for the explanation equations which shows the difference between the original prediction (f(x)) and another prediction (f(x\Ai)). Each model evaluation is done to define an original prediction for each attribute; thus, interpreted by the examiner as an original prediction (N) for each predictor from the plurality of predictors. This method has access to prior probability of the values; thus, interpreted by the examiner as a plurality of pre-trained data statistics. As each attribute has at least one prediction, the pre-determined duplication factor is the number of instances being duplicated to update an attribute).
creating a plurality of duplicate data set of the input feature vector, via the one or more hardware processors, wherein the plurality of duplicate data set is created using the plurality of predictors based on the pre-determined duplication factor; 
(Robnik-Šikonja, Page 3, Column 1, Paragraph 4, “… for a single instance … we need O(a) model evaluations (at least one prediction for each attribute) …”; Page 1 Column 1, Paragraph 2, “… the n learning instances is represented by an ordered pair (x, y); each vector of attribute values x consists of individual values of attributes Ai, i = 1, ..., a (a is the number of attributes), and is labeled with y”. As each attribute has at least one prediction, the n learning instances are created (via duplication) based on how many attributes there are (a) (which is based on the pre-determined duplication factor) as each attribute has at least one prediction as each instance is duplicated to single out one attribute at a time).
computing a contribution factor for each predictor in the duplicate data set, via the one or more hardware processors, wherein the process of computing the contribution factor for each predictor in the duplicate data set comprises: 
(Robnik-Šikonja, Page 1, Column 1, Equation (1), “
    PNG
    media_image1.png
    30
    200
    media_image1.png
    Greyscale
”; Page 3, Column 1, Paragraph 4, “… for a single instance … we need O(a) model evaluations (at least one prediction for each attribute) …”.  Equation (1) is the basis of the how the explanations are taught within the method (Page 1: Equations (2)-(6)). Where the prediction difference explains the change in an attribute effecting the predicted value. Thus, the evaluations of prediction differences is interpreted by the examiner as a contribution factor where the contribution factor is computed for each predictor in the duplicate data set),
replacing the predictor in each duplicate data set in the plurality of duplicate data set with a set of … values to obtain the plurality of estimator data set using the plurality of pre-trained data statistics; 
(Robnik-Šikonja, Page 1, Column 1, Equations (5) & (6), 
    PNG
    media_image2.png
    99
    327
    media_image2.png
    Greyscale
, Page 10, Equations (10) & (11),

    PNG
    media_image3.png
    164
    390
    media_image3.png
    Greyscale
. Equations (5) and (6) can be better explained for the replacing predictor technique via Equations (10) and (11). This method replaces the attribute with a known predefined value as (which is a constant value and not random)).
obtaining a prediction probability for each of the duplicate data sets using the ML model; 
(Robnik-Šikonja, Page 3, Column 1, Paragraph 4, “… for a single instance … we need O(a) model evaluations (at least one prediction for each attribute) …”; Page 10, Equation (11),
    PNG
    media_image3.png
    164
    390
    media_image3.png
    Greyscale
.  The prediction probabilities are processed again with via model evaluations to obtain the difference between the prediction probabilities; where p(y|x\Ai) is the prediction probability for each duplicate sets (1, 2, 3, … i))).
predicting a final prediction (FP) for each of the duplicate data sets based on the prediction probability and the pre-defined threshold of the ML model; and 
(Robnik-Šikonja, Page 9, Column 1, Paragraph 6, “A weight can be interpreted as the proportion of the information contributed by the corresponding attribute value to the final prediction”; Page 4, Column 2, Paragraph 5, “… we introduce a threshold parameter and only attributes with sufficient impact are displayed”; Page 5-6, Fig. 2-4; Page 6, Equation (7). Equation 7 shows the comparison of the trueExpli(x) (FP based) and predDiffi(x) (original prediction based) to compare the final prediction over all attributes. Figs 2, 3, and 4 show all attributes; however, the method can use a pre-define threshold for importance. All attributes are shown as this is a simple example for ease and of visualizing how importance is accumulated for all attributes; thus, interpreted by the examiner as the process of capable of predicting a final prediction for each of the duplicate data sets based on prediction probability and the pre-define threshold).
 computing the contribution factor (CF) for the predictor using the final predictions of the duplicate data set and the original prediction of the predictor; and 
(Robnik-Šikonja, Page 1, Column 1, Equation (1), “
    PNG
    media_image1.png
    30
    200
    media_image1.png
    Greyscale
”. Equation (1) shows the basic form for the explanation equations which shows the difference between the original prediction (f(x)) and another prediction (f(x\Ai)); where the predDiffi(x) is interpreted as the contribution factor of the predictor Ai on the x instance). 
interpreting the ML model by computing a percentage contribution of each of the predictors, via the one or more hardware processors, using the contribution factor of the plurality of predictors, wherein the percentage contribution indicates an importance of each of the predictors during prediction by the ML model. 
(Robnik-Šikonja, Page 5, Figs 2 & 3. Figs 2 & 3 show the interpretations of the ML model in terms of each predictor based on the percentage contribution (information difference); which uses the contribution factor to indicate importance of each predictor (ex. sex in both figures is the most important role)).
Robnik-Šikonja teaches a method for explaining predictions for individual instances by replacing attributes one at a time. Within the proposed method of Robnik-Šikonja, the replacing is taught with using a constant variable. Thus, Robnik-Šikonja does not explicitly disclose:
… random values…
However, the Robnik-Šikonja reference notes Breiman which teaches the random replacement methodology. Breiman explicitly discloses:
… random values…
(Breiman, Page 23, Paragraph 8, “Suppose there are M input variables. After each tree is constructed, the values of the mth variable in the out-of-bag examples are randomly permuted and the out-of-bag data is run down the corresponding tree. The classification given for each xn that is out of bag is saved. This is repeated for m = 1, 2, . . . , M …”. To measure variable importance Breiman teaches replacing of the attributes with a random value to be able to critically understand the interaction of a variable when replaced),
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize Robnik-Šikonja’s method for explaining predictions with a replacement methodology with the replacement of Breiman’s methodology to visualize importance as noted by Robnik-Šikonja (Page 9, Column 2, Paragraph 5, “In the tools accompanying his Random Forests algorithm [28], Breiman has used bootstrap sampling and random permutation of values to visualize the … the importance of features for prediction of individual instances”). One having ordinary skill in the art would have been motivated to implement this change before the effective filing date of the claimed invention, as this leads to measuring of variable importance, interaction analysis, review misclassification rates, correlation analysis, and accuracy (Breiman, Page 24, Figure 4; Page 23, Paragraph 1, “A forest of trees is impenetrable as far as simple interpretations of its mechanism go. In some applications, analysis of medical experiments for example, it is critical to understand the interaction of variables that is providing the predictive accuracy”; Page 29, Paragraph 3, “Random forests are an effective tool in prediction. Because of the Law of Large Numbers they do not overfit. Injecting the right kind of randomness makes them accurate classifiers and regressors. Furthermore, the framework in terms of strength of the individual predictors and their correlations gives insight into the ability of the random forest to predict. Using out-of-bag estimation makes concrete the otherwise theoretical values of strength and correlation. … Forests give results competitive with boosting and adaptive bagging, yet do not progressively change the training set. Their accuracy indicates that they act to reduce bias. … Random inputs and random features produce good results in classification—less so in regression. The only types of randomness used in this study is bagging and random features … A recent paper (Breiman, 2000) shows that in distribution space for two class problems, random forests are equivalent to a kernel acting on the true margin. Arguments are given that randomness (low correlation) enforces the symmetry of the kernel while strength enhances a desirable skewness at abrupt curved boundaries …” ).
Regarding Claim 2:
Robnik-Šikonja and Breiman teach the method of Claim 1 and further teaches:
wherein the Machine Learning (ML) model is interpreted for one of prediction and inference explanation, where the ML model includes a Decision Tree … a Random Forest … a K nearest neighbors Classifier … a Multilayer Perceptron Classifier and a Support Vector Machines.
(Robnik-Šikonja, Page 5, Column 2, Paragraph 3, “Our evaluation scenario includes five different learning algorithms: NB, decision trees (DT), nearest neighbor (kNN), and ANN”; Abstract, “… Our method works for so called black box models such as support vector machines, neural networks, and nearest neighbor algorithms as well as for ensemble methods, such as boosting and random forests …”). 

Regarding Claim 5:
Robnik-Šikonja and Breiman teach the method of Claim 1 and further teaches:
wherein the percentage contribution is expressed using the equation:
 
    PNG
    media_image4.png
    53
    123
    media_image4.png
    Greyscale

(Robnik-Šikonja, Page 6, Column 1, Equation (7), 
    PNG
    media_image5.png
    84
    377
    media_image5.png
    Greyscale
. Equation 7 shows the calculation for explanations (interpreted by the examiner as “why the model predicted the specific output”) over all attributes. The second half of the equation (
    PNG
    media_image6.png
    40
    119
    media_image6.png
    Greyscale
) is interpreted as the percentage contribution as it is the percent the specific attribute Ai contributed to the prediction of the instance x).


Regarding Claims 6-7 and 10:
Claims 6-7 and 10 incorporate substantively all the limitations of Claims 1-2 and 5 in a system and further recites a new additional element a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to (Robnik-Šikonja, Page 9, Column 1, Paragraph 3, “… sources code … as well as the visualization module can be obtained from the authors”. The method taught by Robnik-Šikonja is evaluated on utilizes software packages which implies a processor, memory, and a non-transitory machine-readable information storage medium as they are inherent within a system that uses the software library for machine learning); thus, Claims 6-7 and 10 are rejected for reasons set forth in the rejections of Claims 1-2 and 5, respectively.

Regarding Claim 11:
Claim 11 incorporates substantively all the limitations of Claims 1 in a non-transitory machine-readable information storage medium and further recites a new additional element comprising one or more instructions which when executed by one or more hardware processors cause: (Robnik-Šikonja, Page 9, Column 1, Paragraph 3, “… sources code … as well as the visualization module can be obtained from the authors”. The method taught by Robnik-Šikonja is evaluated on utilizes software packages which implies a processor, memory, and a non-transitory machine-readable information storage medium as they are inherent within a device/apparatus that uses the software library for machine learning); thus, Claim 11 is rejected for reasons set forth in the rejection of Claim 1.



Allowable Subject Matter
Claims 3-4 & 8-9 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Please note Claim 4 is also objected to due to informalities which is noted above within the Claim Objections section.
The following is an examiner’s statement of reasons for allowance: 
A complete and thorough search was performed for these claims; however no prior art was uncovered that teach or fairly suggest the features recited claims. Specifically, none of the prior art of record, either alone or in combination, fairly discloses the expression explicitly defined in:
Claims 3 & 8:
wherein the final prediction (FP) comprises one of a first pre-defined value and a second pre-defined value, wherein the final prediction is predicted as the first pre-defined value for prediction probability equal or higher than the pre-defined threshold and the final predication is predicted as the second pre-defined value for prediction probability lower than the pre-defined threshold. 
Claims 4 & 9:
wherein the contribution factor is expressed as:

    PNG
    media_image7.png
    47
    210
    media_image7.png
    Greyscale

The closest prior art of record is Robnik-Šikonja et al., “Explaining Classifications for Individual Instances”, which discloses a method for explaining predictions for individual instances by replacing attributes one at a time, obtaining prediction probabilities, computing contribution factors, and interpreting the contribution of each predictor. However, Robnik-Šikonja doesn’t disclose the replacing methodology to use a random value. However, Breiman et al., “Random Forests”, discloses replacement with a random value. Nevertheless, the combination of these two prior arts do not disclose the expression defined in Claims 3-4 (where the duplication factor is multiplied by the original prediction and the denominator of the full expression is the duplication factor) & 8-9 (where the final prediction is predicted as the first pre-defined value for prediction probability equal or higher than the pre-defined threshold and the final predication is predicted as the second pre-defined value for prediction probability lower than the pre-defined threshold). 
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IBRAHIM RAHMAN whose telephone number is (703)756-1646. The examiner can normally be reached M-F 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/I.R./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
METHOD AND SYSTEM FOR INTERPRETING MACHINE LEARNING MODEL'S PREDICTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND SYSTEM FOR INTERPRETING MACHINE LEARNING MODEL'S PREDICTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email