Last updated: April 19, 2026
Application No. 17/382,310
POST-HOC LOCAL EXPLANATIONS OF BLACK BOX SIMILARITY MODELS

Final Rejection §101§103
Filed
Jul 21, 2021
Examiner
NAULT, VICTOR ADELARD
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Rensselaer Polytechnic Institute
OA Round
4 (Final)
Interview Optional

— +83.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 13 resolved cases, 2023–2026
Examiner Intelligence

NAULT, VICTOR ADELARD View full profile →
Grants 62% of resolved cases
Career Allow Rate
8 granted / 13 resolved
+6.5% vs TC avg
Strong +83% interview lift
Without
With
+83.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.1%
-10.9% vs TC avg
§103
40.4%
+0.4% vs TC avg
§102
7.5%
-32.5% vs TC avg
§112
21.4%
-18.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 13 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on December 04, 2025, in which claims 1, 9, 10, and 16 have been amended. No additional claims have been cancelled or added.
Claims 1-20 are currently pending.

Response to Arguments
	With regards to the rejections of claim 9-20 under 35 U.S.C. 101, Applicant’s arguments have been considered but are not found persuasive. Applicant argues with respect to claim 9, no arguments particularly address claims 10-20.
	Applicant argues that claim 9 is 101-eligible at least at Prong 2A for integrating any recited abstract ideas into a practical application. Applicant states that claims 9 integrates any recited abstract ideas into a practical application, and cites paragraphs [0008]-[0016] to detail improvements offered by the invention, as well as emphasizes the claim limitations that provide the improvements. Applicant states that the PTAB decision in Ex parte Desjardins recognizes similar improvements as those presented in the instant application to integrate any recited abstract ideas into a practical application.
	However, Examiner emphasizes that additional limitations beyond those considered to be abstract ideas are necessary to integrate any recited abstract ideas into a practical application. Notwithstanding the recent decision in Ex parte Desjardins, Examiner considers appropriate guidance to be found in MPEP 2106.04(d).III:
	“The Prong Two analysis considers the claim as a whole. That is, the limitations containing the judicial exception as well as the additional elements in the claim besides the judicial exception need to be evaluated together to determine whether the claim integrates the judicial exception into a practical application. Because a judicial exception alone is not eligible subject matter, if there are no additional claim elements besides the judicial exception, or if the additional claim elements merely recite another judicial exception, that is insufficient to integrate the judicial exception into a practical application. However, the way in which the additional elements use or interact with the exception may integrate it into a practical application”.
	The additional limitation found in claim 9 that is not categorized as an abstract idea is and in response to the assessment of the explanation of the value of the similarity measure, modifying the machine learning model, which recites mere instructions to modify a machine learning, and does not integrate any recited judicial exceptions into a practical application, MPEP 2106.05(f).
	With regards to the rejections of claims 1-4, 10-13, and 15-19 under 35 U.S.C. 103 as being unpatentable over Ribeiro et al. “’Why Should I Trust You?’ Explaining the Predictions of Any Classifier”, in view of Amirian and Schwenker “Radial Basis Function Networks for Convolutional Neural Networks to Learn Similarity Distance Metric and Improve Interpretability”, Applicant’s arguments that the claims as amended overcome the rejections are partially persuasive. 
	Applicant argues on page 13 of the Remarks that: “Applicant finds no disclosure of removing one or more features identified, based on user feedback, as being non-contributors to the similarity measure or of removing co-occuring features that lack a causal reason for co-occuring and are identified as significant to a corresponding local explanantion in the cited art”. Examiner acknowledges that none of the previously cited prior art teaches “removing co-occuring features that lack a causal reason for co-occuring and are identified as significant”, however Examiner considers Ribeiro to adequately teach “removing one or more features identified, based on user feedback, as being non-contributors to the similarity measure”, as reflected in the updated claim mappings below. Nevertheless, said arguments are moot in light of a new rejection under 35 U.S.C. 103, necessitated by the amendments to the claim, as detailed below.
	Additionally, Applicant argues that one of ordinary skill in the art would not find it obvious to combine the teachings of Ribeiro and Amirian, and even if they did, it would not be obvious how to practically implement such a combination. Particularly, Applicant states on page 13 of the Remarks:
“Applicant maintains that a mere teaching of a similarity measure is insufficient to suggest modifying Ribeiro to support generating an interpretable local description of a similarity measure and, even if sufficient to suggest such a modification, does not disclose or suggest how to implement such a modification, that is, how to modify Ribeiro to support generating an interpretable local description of a similarity measure” and “Applicant notes that a distance between an ‘embedding of an image and the cluster center of all the embeddings’ does not anticipate or render obvious a similarity measure obtained by approximating the similarity measure as a distance between interpretable representations of first and second points where the interpretable representations are generated by applying a function to a corresponding data point as there is no suggestion or motivation to consider a different distance parameter, that is, a distance between interpretable representations of first and second points generated by applying a function to a corresponding data point.” (emphasis Applicant’s).
Examiner respectfully disagrees that it would not be obvious to combine Ribeiro and Amirian to teach the claimed similarity measure between interpretable representations. Ribeiro teaches a method of interpreting machine learning models that is meant to be as broadly applicable as possible, as Ribeiro states: (Ribeiro Abstract) “In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally around the prediction” (emphasis original but bolded by Examiner). It is Examiner’s understanding that Ribeiro’s method can be applied to any machine learning model, including those that produce similarity measures, such as the model taught by Amirian, to create interpretable representations of those models with minimal experimentation, as intended by the authors of Ribeiro. 
Examiner also notes claim 1 (or any other claim) does not recite “where the interpretable representations are generated by applying a function to a corresponding data point” (recited by Applicant on page 13 of the Remarks). Claim 1 does recite approximating the similarity measure as a distance between interpretable representations of first and second points, which Amirian teaches: (Amirian Pg. 9) “The embeddings of CNN are evaluated by their learned distance metric from cluster centers in RBFs. The same distance can be used to measure the distance between a test image and similar images from training data”. Examiner considers images to fall under the broadest reasonable interpretation of “interpretable representation”, but even under a stricter interpretation of the term, Ribeiro’s interpretable representations, which could be applied to images, could use the same distance metric as Amirian’s images. Amirian also offers the motivation that (Amirian Pg. 2) “The potential contributions of our proposed similarity distance metric on the field of computer vision to enhance the transparency of the decision making process”, that is, that providing a distance between interpretable representations allows a human observer to understand the relative similarities between different points.
	Applicant additionally states on page 13 of the Remarks that “Applicant notes that the distance measure between the images alleged by the Examiner does not incorporate a matrix”. Examiner respectfully disagrees, Amirian recites: (Amirian Pg. 4) “the RBF computes a distance between embeddings of deep CNNs…This evaluation process formally defined as: [Amirian Equations 1 and 2] where r represents the distance, Rj is the positive definite covariance matrix (trainable distance)”. It can be seen from Amirian’s equation 1 that the distance is computed using the positive definite covariance matrix.
	With regards to the rejection of claim 9 under 35 U.S.C. 103 as being unpatentable over Cai et al. “Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making”, in view of Hullermeier “Towards Analogy-Based Explanations in Machine Learning”, further in view of Ribeiro et al. “’Why Should I Trust You?’ Explaining the Predictions of Any Classifier”, Applicant’s arguments that the claims as amended overcome the rejections are partially persuasive. 
	Applicant states on page 18 of the Remarks that: “Applicant finds no disclosure or suggestion in the cited art of an optimization formula where a first term in the optimization formula drives the matching analogous pair to have a similar distance between its members as the first pair of points and a second term drives diversity in the matching analogous pairs of points”. Examiner acknowledges that none of the previously cited prior art teaches “a second term drives diversity”, however Examiner considers Hullermeier to adequately teach “an optimization formula where a first term in the optimization formula drives the matching analogous pair to have a similar distance between its members as the first pair of points”, as reflected in the updated claim mappings below. Nevertheless, said arguments are moot in light of a new rejection under 35 U.S.C. 103, necessitated by the amendments to the claim, as detailed below.
	Applicant further argues with respect to independent claim 9 that a person of ordinary skill in the art would not find it obvious to combine Cai and Hullermeier to teach the claimed search for and explanation of matching pairs of data points through analogy. Applicant states on pages 16 and 17 of the Remarks:
	“Applicant maintains that a mere teaching of a similarity measure is insufficient to suggest modifying Hullermeier to support explanation of the value of a similarity measure and, even if sufficient to suggest such a modification, does not disclose or suggest how to implement such a modification, that is, how to modify Hullermeier to support an explanation of the value of a similarity measure”,
	“Hullermeier discusses a deviation that serves as an explanation of the preference. There is no disclosure or suggestion in either Cai or Hullermeier to substitute a similarity measure for a deviation as an explanation of a preference as a similarity is the antithesis of a deviation”,
“Applicant notes that the preference learning discussion cited by the Examiner is distinguished from Hullermeier's discussion of the explanation of a preference, where Hullermeier teaches using a deviation metric”,
and “Applicant maintains, however, that a mere teaching of a similarity measure and the cited knowledge of similarities and deviations is insufficient to suggest modifying Hullermeier to support explanation of the value of a similarity measure and, even if sufficient to suggest such a modification, does not disclose or suggest how to implement such a modification, that is, how to modify Hullermeier to support an explanation of the value of a similarity measure”.
Examiner respectfully disagrees. Examiner reiterates that would be obvious to combine Cai and Hullermeier to teach analogy between matching pairs of points with similarity measures for exactly the reason that a similarity is the opposite of a deviation. Cai is referred to as it more explicitly teaches a similarity measure, however Examiner emphasizes that a measure of similarity and a measure of deviation are merely two perspectives on the same measure. To illustrate, two data points that have the smallest deviation (i.e. difference) are the most similar, and likewise two data points that have the largest deviation are the least similar. Inversion of a deviation easily and obviously provides a similarity. One of ordinary skill in the art would be able to recognize this and invert the deviation taught by Hullermeier into a similarity, as suggested by Cai, and with minimal experimentation.

Claim Objections
Claims 1, 10, and 16 objected to because of the following informalities: removing co-occuring features that lack a causal reason for co-occuring and are identified as significant to a corresponding local explanantion; should read “removing co-occurring features that lack a causal reason for co-occurring and are identified as significant to a corresponding local explanation;”. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 9-20 are rejected under 35 U.S.C. 101.

Regarding claim 9,
Step 1 - “Is the claim to a process, machine, manufacture or composition of matter?”
Yes, the claim is directed to a process.
Step 2A, Prong 1 - “Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea?”:
The limitation of defining a similarity measure between a first pair of points in a data space by operation of a machine learning model recites a judgement of quantifying similarity, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a computer, including a machine learning model recited at a high level of generality.
The limitation of estimating a value of the similarity measure between the first pair of points recites an evaluation of the similarity between the points, which is a mental process, which is an abstract idea.
The limitation of finding matching analogous pairs of points in the data space … recites an observation of the data space and the points within, which is a mental process, which is an abstract idea.
The limitation of … based on an optimization formula where a first term in the optimization formula drives the matching analogous pair to have a similar distance between its members as the first pair of points and a second term drives diversity in the matching analogous pairs of points, recites a mathematical formula, which is a mathematical concept, which is an abstract idea.
and wherein each matching analogous pair of points has a similar value for the similarity measure as does the first pair of points; recites an evaluation of the potentially matching points, which is a mental process, which is an abstract idea.
The limitation of explaining the value of the similarity measure between the first pair of points using analogy to the matching analogous pairs of points recites an evaluation of the quantified similarity between the points, which is a mental process, which is an abstract idea.
The limitation of assessing the explanation of the value of the similarity measure using a rubric recites an evaluation of the explanation, which is a mental process, which is an abstract idea.
Step 2A, Prong 2 - “Does the claim recite additional elements that integrate the judicial exception into a practical application?”:
The limitation of and in response to the assessment of the explanation of the value of the similarity measure, modifying the machine learning model recites mere instructions to apply the exception to modify a machine learning model, reciting only the idea of modifying the model based on received feedback, and at a high level of generality, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(f).
Step 2B - “Does the claim recite additional elements that amount to significantly more than the judicial exception?”:
The limitation of and in response to the assessment of the explanation of the value of the similarity measure, modifying the machine learning model recites mere instructions to apply the exception to modify a machine learning model, reciting only the idea of modifying the model based on received feedback, and at a high level of generality, which does not integrate the exception into a practical application, MPEP 2106.05(f).
Therefore, claim 9 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 10,
	Step 1 - “Is the claim to a process, machine, manufacture or composition of matter?”
Yes, the claim is directed to a manufacture.
Step 2A, Prong 1 - “Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea?”:
The limitation of defining a similarity measure between first and second points in a data space by operation of a machine learning model recites a judgement of quantifying similarity, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a computer, including a machine learning model recited at a high level of generality.
The limitation of generating interpretable representations of the first and second points recites an evaluation of the points, which is a mental process, which is an abstract idea.
The limitation of generating an interpretable local description of the similarity measure recites an evaluation of the similarity measure, which is a mental process, which is an abstract idea.
The limitation of by approximating the similarity measure as a distance between the interpretable representations of the first and second points, wherein the distance between the interpretable representations incorporates a matrix recites a mathematical formula, which is a mathematical concept, which is an abstract idea.
The limitation of learning values for the matrix through optimizing a loss function evaluated on perturbations of the first and second points recites a mathematical calculation, which is a mathematical concept, which is an abstract idea.
The limitation of explaining a value of the similarity measure between the first and second points using elements of the matrix that provide insight into features responsible for the similarity recites an evaluation of the similarity between the two points, which is a mental process, which is an abstract idea.
The limitation of assessing the explanation of the value of the similarity measure using a rubric recites an evaluation of the explanation, which is a mental process, which is an abstract idea.
The limitation of … features identified, based on user feedback, as being non-contributors to the similarity measure recites an evaluation of the features, which is a mental process, which is an abstract idea.
The limitation of … co-occuring features that lack a causal reason for co-occuring and are identified as significant to a corresponding local explanantion recites an evaluation of the features, which is a mental process, which is an abstract idea.
Step 2A, Prong 2 - “Does the claim recite additional elements that integrate the judicial exception into a practical application?”:
The limitation of in response to the assessment of the explanation of the value of the similarity measure, modifying the machine learning model by removing one or more features…and removing co-occuring features … recites mere instructions to apply the exception to modify a machine learning model, reciting only the idea of modifying the model and removing features based on received feedback, and at a high level of generality, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(f).
The limitation of and deploying the modified machine learning model recites mere instructions to apply the earlier modified machine learning model, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(f).
Step 2B - “Does the claim recite additional elements that amount to significantly more than the judicial exception?”:
The limitation of The limitation of in response to the assessment of the explanation of the value of the similarity measure, modifying the machine learning model by removing one or more features…and removing co-occuring features …  recites mere instructions to apply the exception to modify a machine learning model, reciting only the idea of modifying the model and removing features based on received feedback, and at a high level of generality, which does not amount to significantly more than the judicial exception, MPEP 2106.05(f).
The limitation of and deploying the modified machine learning model recites mere instructions to apply the earlier modified machine learning model, which does not integrate the exception into a practical application, MPEP 2106.05(f).
Therefore, claim 10 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 11,
Claim 11 adds the additional limitation of wherein modifying the machine learning model comprises eliminating at least one feature from a vocabulary of the model to claim 10, which recites a judgement of what feature(s) to no longer consider, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a computer, including a machine learning model recited at a high level of generality.
Therefore, claim 11 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 12,
Claim 12 adds the additional limitation of wherein the interpretable representations of the first and second points comprise vectors of binary elements, each element representing presence or absence of a feature from a vocabulary of the first and second points to claim 10, which recites a mathematical formula of the interpretable representations, which is a mathematical concept, which is an abstract idea.
Therefore, claim 12 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 13,
Claim 13 adds the additional limitation of wherein the vocabulary of the first and second points comprises a plurality of words to claim 12, which recites a judgement on what the vocabulary comprises, which is a mental process, which is an abstract idea.
Therefore, claim 13 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 14,
Claim 14 adds the additional limitation of wherein the vocabulary of the first and second points comprises a plurality of numeric value buckets to claim 12, which recites a judgement on what the vocabulary comprises, which is a mental process, which is an abstract idea.
Therefore, claim 14 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 15,
Claim 15 adds the additional limitation of wherein perturbing the first and second points comprises at least one of setting binary elements to zero to represent removal of features from the vocabulary, and addition of a small random value to a numeric value to claim 10, which recites mathematical formulas of the perturbations, which are mathematical concepts, which are abstract ideas.
Therefore, claim 15 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 16,
Claim 16 recites a system implementing the method of claim 10 with substantially the same limitation, therefore the same analysis and rejection applies. Therefore, claim 16 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 17,
Claim 17 recites a system implementing the method of claim 12 with substantially the same limitation, therefore the same analysis and rejection applies. Therefore, claim 17 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 18,
Claim 18 recites a system implementing the method of claim 13 with substantially the same limitation, therefore the same analysis and rejection applies. Therefore, claim 18 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 19,
Claim 19 adds the additional limitation of wherein modifying the machine learning model comprises eliminating at least one feature from a vocabulary of the model to claim 18, which recites a judgement of what feature(s) to no longer consider, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a computer, including a machine learning model recited at a high level of generality.
Therefore, claim 19 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 20,
Claim 20 recites a system implementing the method of claim 14 with substantially the same limitation, therefore the same analysis and rejection applies. Therefore, claim 20 is found to be ineligible subject matter under 35 U.S.C. 101.

Prior Art
The following references are used for prior art claim rejections:
Ribeiro et al. “’Why Should I Trust You?’ Explaining the Predictions of Any Classifier”
Amirian and Schwenker “Radial Basis Function Networks for Convolutional Neural Networks to Learn Similarity Distance Metric and Improve Interpretability”
Bodria et al. “Benchmarking and Survey of Explanation Methods for Black Box Models”
Fong and Vedaldi “Explanations for Attributing Deep Neural Network Predictions”
Boardman et al. (U.S. Patent No. 9,281,689)
De Bruin et al. (U.S. Patent Publication No. 2014/0279746)
Cai et al. “Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making”
Hullermeier “Towards Analogy-Based Explanations in Machine Learning”
Chen et al. (U.S. Patent Publication No. 2020/0388358)
Mothilal et al. “Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 10-13, 15-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro et al. “’Why Should I Trust You?’ Explaining the Predictions of Any Classifier”, hereinafter Ribeiro, in view of Amirian and Schwenker “Radial Basis Function Networks for Convolutional Neural Networks to Learn Similarity Distance Metric and Improve Interpretability”, hereinafter Amirian, further in view of Chen et al. (U.S. Patent Publication No. 2020/0388358), hereinafter Chen.

Regarding claim 1,
	Ribeiro teaches A method comprising:
generating interpretable representations of the first and second points ((Ribeiro Pg. 3) “As mentioned before, interpretable explanations need to use a representation that is understandable to humans, regardless of the actual features used by the model. For example, a possible interpretable representation for text classification is a binary vector indicating the presence or absence of a word, even though the classifier may use more complex (and incomprehensible) features such as word embeddings”)
generating an interpretable local description [of the similarity measure] (Ribeiro Pg. 4, Fig. 3 shows an interpretable local description of a model's decision function, similarity measure is not explicitly taught by Ribeiro)

    PNG
    media_image1.png
    390
    415
    media_image1.png
    Greyscale


learning values for the matrix through optimizing a loss function ((Ribeiro Pg. 3) “We want to minimize the locality-aware loss L(f, g, πx)...We sample instances around x' by drawing nonzero elements of x' uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z' ∈ {0, 1}d' (which contains a fraction of the nonzero elements of x'), we recover the sample in the original representation z ∈ Rd and obtain f(z)”, (Ribeiro Pg. 5) “We propose to give a global understanding of the model by explaining a set of individual instances…Given the explanations for a set of instances X (|X| = n), we construct an n × d’ explanation matrix W”, sampling instances based on minimized loss which are used in a matrix corresponds to learning values for a matrix through optimizing a loss function)
the learning values comprising individually perturbing the first and second points to generate perturbed first and second points ((Ribeiro Pg. 3) “We sample instances around x’ by drawing nonzero elements of x’ uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z’ ∈ {0, 1}d’ (which contains a fraction of the nonzero elements of x’), we recover the sample in the original representation”, random sampling of perturbed samples corresponds to individually perturbing first and second points as both samplings are done at random)
wherein the loss function is evaluated on the perturbed first and second points ((Ribeiro Pg. 3) “We want to minimize the locality-aware loss L(f, g, πx)...we approximate L(f, g, πx) by drawing samples…We sample instances around x' by drawing nonzero elements of x' uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z' ∈ {0, 1}d' (which contains a fraction of the nonzero elements of x'), we recover the sample in the original representation z ∈ Rd and obtain f(z)”, evaluating a loss function by drawing samples which are perturbed corresponds to evaluating a loss function on perturbed first and second points)
explaining a value [of the similarity measure] between the first and second points using elements of the matrix that provide insights into features responsible for the similarity ((Ribeiro Pg. 5) Given the explanations for a set of instances X (|X| = n), we construct an n × d’ explanation matrix W that represents the local importance of the interpretable components for each instance”, Ribeiro Pg. 5, Fig. 5 shows that the matrix explains the features most responsible for the similarities between instances (in this example documents), similarity measure is not explicitly taught by Ribeiro)

    PNG
    media_image2.png
    367
    591
    media_image2.png
    Greyscale

assessing the explanation of the value [of the similarity measure] using a rubric ((Ribeiro Pg. 8-9) “We show the “Husky” mistake in Figure 11a. The other 8 examples are classified correctly. We then ask the subject three questions: (1) Do they trust this algorithm to work well in the real world, (2) why, and (3) how do they think the algorithm is able to distinguish between these photos of wolves and huskies. After getting these responses, we show the same images with the associated explanations, such as in Figure 11b, and ask the same questions”, similarity measure is not explicitly taught by Ribeiro)
in response to the assessment of the explanation of the value [of the similarity measure], modifying the machine learning model ((Ribeiro Pg. 8) “We start the experiment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed)”) by removing one or more features identified, based on user feedback, as being non-contributors [to the similarity measure] (Ribeiro Pg. 8) “Explanations can aid in this process by presenting the important features, particularly for removing features that the users feel do not generalize. We use the 20 newsgroups data here as well, and ask Amazon Mechanical Turk users to identify which words from the explanations should be removed from subsequent training”, similarity measure is not explicitly taught by Ribeiro)
and deploying the modified machine learning model ((Ribeiro Pg. 7) “we want to evaluate whether explanations can help users decide which classifier generalizes better, i.e., which classifier would the user deploy ‘in the wild’”)
Amirian teaches the following further limitations that Ribeiro does not explicitly teach:
defining a similarity measure between first and second points in a data space by operation of a machine learning model ((Amirian Pg. 1) “The training process of RBF architectures employs a distance metric optimization that we propose to use as a similarity distance metric to find similar and dissimilar images”, (Amirian Pg. 4) “The distance r computed in Equation 1 is not only a measure of the proximity of an image to a cluster center but can also be used to compare images and find similar and dissimilar images in the embeddings space”)
by approximating the similarity measure as a distance between [the interpretable representations of] the first and second points ((Ribeiro Pg. 9) “The embeddings of CNN are evaluated by their learned distance metric from cluster centers in RBFs. The same distance can be used to measure the distance between a test image and similar images from training data”), wherein the distance [between the interpretable representations] incorporates a matrix ((Amirian Pg. 4) “Optimizing the RBF networks with an identity covariance matrix results in training in Euclidean space. It is possible to optimize a Mahalanobis distance [29] by training the main diagonal on the covariance matrix. Any arbitrary distance metric can be trained by optimizing the entire covariance matrix while projecting the matrix to the space of positive definite matrices. The distance r computed in Equation 1 is not only a measure of the proximity of an image to a cluster center but can also be used to compare images and find similar and dissimilar images in the embeddings space”, Ribeiro teaches interpretable representations, broadest reasonable interpretation of an interpretable representation includes images)
explaining a value of the similarity measure between the first and second points using elements of the matrix [that provide insights into features responsible for the similarity] ((Amirian Pg. 4) “Rj is the positive definite covariance matrix (trainable distance)”, (Amirian Pgs. 7-9) “Using a different approximation strategy compared with fully connected layers provides CNN-RBFs with the chance to probe the decision-making process based on these visual clues: Similar images as measured by the similarity distance metric of RBFs trained on CNN embeddings”, Amirian Pg. 9, Fig. 7 shows various similarity measures between points in a data space (images within an embedding space) being explained with elements of the positive definite covariance matrix, where the elements are the trained distances, Ribeiro teaches elements of a matrix that provide insights into features responsible for the similarity)

    PNG
    media_image3.png
    542
    740
    media_image3.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Ribeiro and Amirian by taking the method for learning interpretable local descriptions of classifiers, including learning with a loss function, assessing explanations with a rubric, and modifying a model in response to the assessment, taught by Ribeiro, and adding the use of a similarity measure and a matrix, taught by Amirian, as providing a similarity measure and a matrix imparts the predictable benefit of enhancing the interpretability of the machine learning model, increasing trust in the model and increasing the visibility of issues with the model to a human debugger. Such a combination would be obvious.
Chen teaches the following further limitation that neither Ribeiro nor Amirian explicitly teaches:
and removing co-occuring features that lack a causal reason for co-occuring and are identified as significant [to a corresponding local explanantion]; ((Chen [0056]) “At step 208 we then select, or, equivalently, remove or deselect features in response to operator input, using a human-in-the-loop, such as operator 104 of FIG. 1. In particular, an expert such as a physician 104 operating a computer 106 views the randomly selected features with the highest information gain and then removes those that are deemed not trustworthy or causally unrelated to the prediction task of the model. For example, if one of the features was "number_of_breakfasts" and the prediction task is inpatient mortality, the operator may choose to deselect that feature because it is not causally connected to whether the patient is at risk of inpatient mortality”, features with the highest information gain are identified as significant, local explanations taught by Ribeiro)
At the time of filing, one of ordinary skill in the art would have motivation to combine Ribeiro, Amirian, and Chen by taking the method for learning interpretable local descriptions of classifiers, including using a similarity measure approximated as a distance incorporating a matrix, jointly taught by Ribeiro and Amirian, and adding the removal of co-occurring features that are identified as significant but lack causal reason for doing so, taught by Chen, as features that lack causal reason for co-occurring but are significant to the model are likely to be useless or even harmful for model predictions once a model is deployed, as is well-known in the art, and so their removal increases the robustness of a machine learning model. Such a combination would be obvious.

Regarding claim 2,
Ribeiro, Amirian, and Chen jointly teach The method of claim 1,
Ribeiro additionally teaches:
wherein modifying the machine learning model comprises eliminating at least one feature from a vocabulary of the model ((Ribeiro Pg. 8) “We start the experiment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed)”
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Ribeiro, Amirian, and Chen for the parent claim of claim 2, claim 1, for reasons mentioned previously. All additional limitations in claim 2 are present in Ribeiro, so no additional rationale for combination is required.

Regarding claim 3,
Ribeiro, Amirian, and Chen jointly teach The method of claim 1,
Ribeiro additionally teaches:
wherein, in the step of generating interpretable representations, the interpretable representations of the first and second points comprise vectors of binary elements, each element representing presence or absence of a feature from a vocabulary of the first and second points ((Ribeiro Pg. 3) “As mentioned before, interpretable explanations need to use a representation that is understandable to humans, regardless of the actual features used by the model. For example, a possible interpretable representation for text classification is a binary vector indicating the presence or absence of a word, even though the classifier may use more complex (and incomprehensible) features such as word embeddings”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Ribeiro, Amirian, and Chen for the parent claim of claim 3, claim 1, for reasons mentioned previously. All additional limitations in claim 3 are present in Ribeiro, so no additional rationale for combination is required.

Regarding claim 4,
Ribeiro and Amirian jointly teach The method of claim 3,
Ribeiro additionally teaches:
wherein the vocabulary of the first and second points comprises a plurality of words ((Ribeiro Pg. 8) “We use the 20 newsgroups data here as well, and ask Amazon Mechanical Turk users to identify which words from the explanations should be removed from subsequent training, for the worse classifier from the previous section”)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Ribeiro, Amirian, and Chen for the parent claim of claim 4, claim 3, for reasons mentioned previously. All additional limitations in claim 4 are present in Ribeiro, so no additional rationale for combination is required.

Regarding claim 10,
Claim 10 recites a computer program product, within which is the instructions for performing a method consisting of limitations also within claim 1. Specifically, claim 10 recites A computer program product comprising one or more computer readable storage media that embody computer executable instructions, which when executed by a computer cause the computer to perform a method comprising: [the method of claim 1]. Ribeiro teaches (Ribeiro Pg. 6) “Code and data for replicating our experiments are available at https://github.com/marcotcr/lime-experiments”. Additionally, within claim 10, the fourth limitation reads learning values for the matrix through optimizing a loss function evaluated on perturbations of the first and second points; rather than “learning values for the matrix through optimizing a loss function, the learning values comprising individually perturbing the first and second points to generate perturbed first and second points wherein the loss function is evaluated on the perturbed first and second points;”, which is the version of the limitation within claim 1, however the narrower claim 1’s version of the limitation includes claim 10’s broader version of the limitation, and thus they are mapped equivalently. All other limitations in claim 10 are substantially the same as those in claim 1, therefore the same rationale for rejection applies. All additional limitations in claim 10 are present in Ribeiro, so no additional rationale for combination is required.

Regarding claim 11,
Claim 11 recites a medium containing instructions for the method of claim 2 with substantially the same limitation, therefore the same analysis and rejection applies.

Regarding claim 12,
Claim 12 recites a medium containing instructions for the method of claim 3 with substantially the same limitation, therefore the same analysis and rejection applies.

Regarding claim 13,
Claim 13 recites a medium containing instructions for the method of claim 4 with substantially the same limitation, therefore the same analysis and rejection applies.

Regarding claim 15,
Ribeiro, Amirian, and Chen jointly teach The computer readable medium of claim 10,
Ribeiro additionally teaches:
wherein perturbing the first and second points comprises at least one of setting binary elements to zero to represent removal of features from the vocabulary, and addition of a small random value to a numeric value ((Ribeiro Pg. 3) “We denote x ∈ Rd be the original representation of an instance being explained, and we use x' ∈ {0, 1}d' to denote a binary vector for its interpretable representation…We sample instances around x' by drawing nonzero elements of x' uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample z' ∈ {0, 1}d' (which contains a fraction of the nonzero elements of x')”, sampling by only copying some nonzero elements of a binary vector to create perturbed samples corresponds to perturbing by setting binary elements to zero)
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Ribeiro, Amirian, and Chen for the parent claim of claim 15, claim 10, for reasons mentioned previously. All additional limitations in claim 15 are present in Ribeiro, so no additional rationale for combination is required.

Regarding claim 16,
Claim 16 recites an apparatus which performs the method of claim 10. Specifically, claim 16 recites An apparatus comprising: a memory embodying computer executable instructions; and at least one processor, coupled to the memory, and operative by the computer executable instructions to perform a method comprising: [the method of claim 10]. Amirian teaches (Amirian Pg. 7) “The hyperparameter searches in Figure 5 are conducted using the hyperband [45] algorithm with 4 agents running in parallel on two Quadro T2000 graphic processing units (GPUs) for approximately 10 days”. Any computing system, such as the one recited in Amirian, would inherently contain a processor and a memory. One of ordinary skill in the art would have motivation to combine Ribeiro, Amirian, and Chen to perform the method of claim 10 using the system described, as it would have been obvious to use a computer to modify and deploy a machine learning model. All other limitations in claim 16 are substantially the same as those in claim 10, therefore the same rationale for rejection applies.

Regarding claim 17,
Claim 17 recites a system implementing the method of claim 3 with substantially the same limitation, therefore the same analysis and rejection applies.

Regarding claim 18,
Claim 18 recites a system implementing the method of claim 4 with substantially the same limitation, therefore the same analysis and rejection applies.

Regarding claim 19,
Ribeiro, Amirian, and Chen jointly teach The apparatus of claim 18,
Ribeiro additionally teaches:
wherein modifying the machine learning model comprises eliminating at least one feature from a vocabulary of the model ((Ribeiro Pg. 8) “We start the experiment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed)”
It would have obvious to one of ordinary skill in the art before the effective filing date of the present application to combine Ribeiro, Amirian, and Chen for the parent claim of claim 19, claim 18, for reasons mentioned previously. All additional limitations in claim 19 are present in Ribeiro, so no additional rationale for combination is required.

Claims 5, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro, in view of Amirian, further in view of Chen, further in view of Bodria et al. “Benchmarking and Survey of Explanation Methods for Black Box Models”, hereinafter Bodria.

Regarding claim 5,
Ribeiro, Amirian, and Chen jointly teach The method of claim 3,
Bodria teaches the following further limitation that neither Ribeiro, nor Amirian, nor Chen teaches:
wherein the vocabulary of the first and second points comprises a plurality of numeric value buckets (Bodria Pg. 7 Fig. 2a-d shows numeric value buckets being used as features)

    PNG
    media_image4.png
    432
    736
    media_image4.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Ribeiro, Amirian, Chen, and Bodria by taking the method for learning interpretable local descriptions of classifiers, including the use of a similarity measure, a matrix, and using elements of the matrix to provide insights into features contributing to the similarity measure, taught jointly by Ribeiro, Amirian, and Chen, and adding the use of numeric value buckets as the features, taught by Bodria, as Bodria teaches: (Bodria Pg. 6) “The explainer assigns to each feature an importance value which represents how much that particular feature was important for the prediction under analysis. Formally, given a record x, an explainer f(·) models a feature importance explanation as a vector e = {e1, e2, ... , em}, in which the value ei ∈ e is the importance of the ith feature for the decision made by the black-box model b(x)”, i.e. that doing so provides the predictable benefit of enhancing the interpretability of the prediction by showing how important each feature was in the prediction. Such a combination would be obvious.

Regarding claim 14,
Claim 14 recites a medium containing instructions for the method of claim 5 with substantially the same limitation, therefore the same analysis and rejection applies.

Regarding claim 20,
Claim 20 recites a system implementing the method of claim 5 with substantially the same limitation, therefore the same analysis and rejection applies.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Amirian, further in view Chen, further in view of Fong and Vedaldi “Explanations for Attributing Deep Neural Network Predictions”, hereinafter Fong.

Regarding claim 6,
Ribeiro, Amirian, and Chen jointly teach The method of claim 1,
Fong teaches the following further limitation that neither Ribeiro, nor Amirian, nor Chen teaches:
wherein perturbing the first and second points comprises and addition of a small random value to a numeric value ((Fong Pgs. 6-7) “Since we do not have access to the image generation process, we consider three obvious proxies: replacing the region R with a constant value, injecting noise, and blurring the image (Fig. 8.2)…Formally, let m : Λ → [0, 1] be a mask, associating each pixel u ∈ Λ with a scalar value m(u). Then the perturbation operator is defined as [Φ(x0; m)](u) =…m(u)x0(u) + (1 − m(u))η(u), noise…where μ0 is an average color, η(u) are i.i.d. Gaussian noise samples for each pixel”, perturbing pixels of an image corresponds to perturbing at least first and second points of an image, the noise perturbation operator of addition of a small (scaled by (1 − m(u))) amount of Gaussian noise, Gaussian noise is random in nature)

    PNG
    media_image5.png
    422
    876
    media_image5.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Ribeiro, Amirian, Chen, and Fong by taking the method for learning interpretable local descriptions of classifiers, including the use of a similarity measure, a matrix, using elements of the matrix to provide insights into features contributing to the similarity measure, and perturbation to create the interpretable local descriptions, taught jointly by Ribeiro, Amirian, and Chen and adding perturbation by adding small random values to original numeric values, taught by Fong, as Fong teaches: (Fong Pg. 6) “The aim of attribution is to identify which regions of an image x0 are used by the black box to produce the output value f(x0). We can do so by observing how the value of f(x) changes as x is obtained ‘deleting’ different regions R of x0… While conceptually simple, there are several problems with this idea. The first one is to specify what it means to ‘delete’ information”, and adding a small amount of random noise to the pixel values within a region is an effective way to delete information. Such a combination would be obvious.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Amirian, further in view of Chen, further in view of Boardman et al. (U.S. Patent No. 9,281,689), hereinafter Boardman.

Regarding claim 7,
Ribeiro, Amirian, and Chen jointly teach The method of claim 1,
Boardman teaches the following further limitation that neither Ribeiro, nor Amirian, nor Chen teaches:
further comprising deploying the machine learning model to operate an electrical distribution network ((Boardman Col. 35, lines 51-58) “In accordance with another embodiment of the disclosed subject matter, one or more components (e.g., DNNC component, PSBC, ECM, etc.) in the communication network environment can utilize artificial intelligence (AI) techniques or methods to infer (e.g., reason and draw a conclusion based at least in part on a set of metrics, arguments, or known outcomes in controlled scenarios) an automated response to perform in response to an inference(s); a type of power system correction action(s) to be performed”, artificial intelligence techniques include machine learning models, inferring an automatic response of a power system correction action corresponds to operating an electrical distribution network) by controlling a switch network ((Boardman Col. 7, lines 5-18) “The PSBC can analyze information relating to a detected power system imbalance and can identify (e.g., automatically) a power balance correction action that can rectify or compensate for the imbalance in that portion of the tier of the multitier hierarchical EDN, wherein a power balance correction action can comprise, for example, switching (e.g., automatically or dynamically) certain loads (e.g., consumer consumption nodes (CCNs) (e.g., homes or businesses consuming power) connected to that portion of the tier of the EDN from a transmission line of one phase to a transmission line of another phase of the multi-phase power transmission lines to restore balance between phases of the multi-phase power for that portion of the tier”, switching loads between transmission lines within an electrical network corresponds to controlling a switch network)
At the time of filing, one of ordinary skill in the art would have motivation to combine Ribeiro, Amirian, Chen, and Boardman by taking the method for learning interpretable local descriptions of classifiers, including the use of a similarity measure, a matrix, and using elements of the matrix to provide insights into features contributing to the similarity measure, taught jointly by Ribeiro, Amirian, and Chen, and adding the use of the classifier machine learning model to operate an electrical distribution network via controlling a switch network, taught by Boardman, as Boardman teaches: (Boardman Col. 9, lines 39-42) “It is desirable to be able to manage power distribution, including any identified power imbalances, in a more localized way to facilitate more efficient power distribution in a power grid”. Such a combination would be obvious.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Ribeiro in view of Amirian, further in view of De Bruin et al. (U.S. Patent Publication No. 2014/0279746), hereinafter De Bruin.

Regarding claim 8,
Ribeiro, Amirian, and Chen jointly teach The method of claim 1,
De Bruin teaches the following further limitations that neither Ribeiro, nor Amirian, nor Chen explicitly teaches:
deploying the machine learning model to produce preliminary diagnoses of hospitalized patients ((De Bruin [0108]) “One of the computational data analysis and machine learning and inference methods and procedures used in the medical expert system is semi-supervised learning and inference methods…In the case of medical diagnosis, for example, the unlabeled data corresponds to clinical and laboratory data derived from patients, but for which the true diagnosis or reference diagnosis is not known for the medical expert system, or not clinically confirmed”, (De Bruin Claim 29) “A medical system comprising:…wherein the feature data scheme comprises: (a) a diagnostic model for determining a diagnosis of one of a plurality of known disorders indicated by an individual patient dataset; and (b) a plurality of diagnosis-specific treatment response models, each of the diagnosis-specific treatment response models corresponding to a specific diagnosis of one of the known disorders, the treatment response models configured to use feature data to predict treatment response”)
and treating at least one of the hospitalized patients consistent with at least a corresponding one of the preliminary diagnoses ((De Bruin [0220]) “The expert system and prediction methodology disclosed herein significantly improves the likelihood of correct treatment determination in the first instance”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Ribeiro, Amirian, Chen, and De Bruin by taking the method for learning interpretable local descriptions of classifiers, including the use of a similarity measure, a matrix, and using elements of the matrix to provide insights into features contributing to the similarity measure, taught jointly by Ribeiro, Amirian, and Chen, and adding the use of the classifier machine learning model to produce diagnoses of hospitalized patients, and treating the hospitalized patients accordingly, taught by De Bruin, as De Bruin teaches: (De Bruin [0111]) “In this way information technology and machine learning and inference methodology can be combined to provide an intelligent medical data management system to aid the clinician or physician in treating diseases of various types”. Such a combination would be obvious.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Cai et al. “Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making”, hereinafter Cai, in view of Hullermeier “Towards Analogy-Based Explanations in Machine Learning”, hereinafter Hullermeier, further in view of Ribeiro, further in view of Mothilal et al. “Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations”, hereinafter Mothilal.

Regarding claim 9,
	Cai teaches:
defining a similarity measure between a first pair of points in a data space by operation of a machine learning model ((Cai Pg. 4) “To find similar images, the query image is fed through a pre-trained deep neural network to retrieve its image embedding, a compressed representation of the image that corresponds to a point in high-dimensional coordinate space (as explained in Related Work). Visually similar images are located at points closer together in the embedding space”)
estimating a value of the similarity measure between the first pair of points ((Cai Pg. 6) “Visualize refinement: A scatterplot overview showing each search results' embedding distances from the query image, color-coded by diagnosis”)
Hullermeier teaches the following further limitations that Cai does not explicitly teach:
finding matching analogous pairs of points in the data space based on an optimization formula ((Hullermeier Pg. 4) “In the numerical case, assuming all attributes to be normalized to the unit interval [0, 1], the concept of analogical proportion can be extended on the basis of generalized logical operators [6,9]. In this case, the analogical proportion will become a matter of degree, i.e., a quadruple (a, b, c, d) can be in analogical proportion to some degree between 0 and 1. An example of such a proportion, with R being the arithmetic difference, i.e., R(a, b) = a - b, is the following: [Hullermeier Equation 2]”, (Hullermeier Pg. 9) “several analogies, perhaps sorted by their strength (degree of analogical proportion), could be extracted from the training data”

    PNG
    media_image6.png
    92
    806
    media_image6.png
    Greyscale

where a first term in the optimization formula drives the matching analogous pair to have a similar distance between its members as the first pair of points … (Hullermeier Equation 2, Pg. 4 shows a term |(a – b) – (c – d)|, where the larger the difference between the pair (a, b) and the pair (c, d) are, the lower the degree of analogous proportion, driving the pairs to have similar distances)
and wherein each matching analogous pair of points has a similar value [for the similarity measure] as does the first pair of points; (Hullermeier Fig. 1, Pg. 5 shows matching pairs of points in a data space, with the original pair of points and the matching pair of points in analogical proportion to each other and with similar distances between each point in the pair, similarity measure taught by Cai)

    PNG
    media_image7.png
    305
    725
    media_image7.png
    Greyscale

explaining the value [of the similarity measure] between the first pair of points using analogy to the matching analogous pairs of points; ((Hullermeier Pg. 10) “The explanation of preferences in terms of analogy appears to be quite natural. For the purpose of illustration, consider again our example: Why did the learner predict a preference c > d, i.e., that journal c is ranked higher (evaluated better than) journal d? To give an explanation, one could find a preference a > b between journals in the training data, so that (a, b) is in analogical proportion to (c, d). In the case of arithmetic proportions, this means that the feature values (ratings of criteria) of a deviate from the feature values of din much the same way as those of c deviate from those of d, and this deviation will then serve as an explanation of the preference”, similarity measure taught by Cai)
At the time of filing, one of ordinary skill in the art would have motivation to combine Cai and Hullermeier by taking the method in which a similarity measure (distance) between points (images) in a data space (the embedding coordinate space) is generated by operation of a machine learning model (a deep neural network), and said similarity measure is estimated, taught by Cai, and adding finding two analogous pairs of points in a data space using an optimization formula, and explaining the value for one pair with analogy to the other pair, taught by Hullermeier, as Hullermeier teaches: (Hullermeier Pg. 2) “analogy-based explanations can complement similarity-based explanations in a meaningful way”. Such a combination would be obvious.
Ribeiro teaches the following further limitations that neither Cai nor Hullermeier explicitly teach:
assessing the explanation of the value [of the similarity measure] using a rubric ((Ribeiro Pg. 8-9) “We show the “Husky” mistake in Figure 11a. The other 8 examples are classified correctly. We then ask the subject three questions: (1) Do they trust this algorithm to work well in the real world, (2) why, and (3) how do they think the algorithm is able to distinguish between these photos of wolves and huskies. After getting these responses, we show the same images with the associated explanations, such as in Figure 11b, and ask the same questions”, similarity measure taught by Cai)
and in response to the assessment of the explanation of the value [of the similarity measure], modifying the machine learning model ((Ribeiro Pg. 8) “We start the experiment with 10 subjects. After they mark words for deletion, we train 10 different classifiers, one for each subject (with the corresponding words removed)”, similarity measure taught by Cai)
At the time of filing, one of ordinary skill in the art would have motivation to combine Cai, Hullermeier, and Ribeiro by taking the method in which a similarity measure (distance) between points (images) in a data space (the embedding coordinate space) is generated by operation of a machine learning model (a deep neural network), estimating said similarity measure, finding two analogous pairs of points in a data space using an optimization function, and explaining the value for one pair with analogy to the other pair, taught jointly by Cai and Hullermeier, and adding evaluating an explanation of values relating to a machine learning model via a rubric, and in response to said evaluation, modifying the machine learning model, taught by Ribeiro, as Ribeiro teaches: (Ribeiro Pg. 8) “If one notes that a classifier is untrustworthy, a common task in machine learning is feature engineering, i.e. modifying the set of features and retraining in order to improve generalization. Explanations can aid in this process by presenting the important features, particularly for removing features that the users feel do not generalize”. Such a combination would be obvious.
Mothilal teaches the following further limitation that neither Cai, nor Hullermeier, nor Ribeiro explicitly teaches:
… and a second term drives diversity in the [matching analogous] pairs of points, ((Mothilal Pg. 4) “We use the following metric based on the determinant of the kernel matrix given the counterfactuals: dpp_diversity = det(K)”, (Mothilal Pg. 4) “Based on the above definitions of diversity and proximity, we consider a combined loss function over all generated counterfactuals: [Mothilal Equation 4]”, dpp_diversity is a second term driving diversity within an optimization formula between points c1, …, ck, Hullermeier teaches analogous relationships between pairs of points)

    PNG
    media_image8.png
    100
    502
    media_image8.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Cai, Hullermeier, Ribeiro, and Mothilal by taking the method in which a similarity measure (distance) between points (images) in a data space (the embedding coordinate space) is generated by operation of a machine learning model (a deep neural network), estimating said similarity measure, finding two analogous pairs of points in a data space using an optimization function, explaining the value for one pair with analogy to the other pair and evaluating the explanation with a rubric, and modifying the machine learning model in response, taught jointly by Cai, Hullermeier, and Ribeiro, and adding a term to the optimization function to encourage diversity in the results, taught by Mothilal, as Mothilal teaches: (Mothilal Pg. 4) “In other domains of information search such as search engines and recommendation systems, multiple studies [15, 22, 35, 45] show the benefits of presenting a diverse set of information items to a user”. Such a combination would be obvious.





















Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zhang et al. “Contextual Local Explanation for Black Box Classifiers” discloses a method for creating interpretable local descriptions of a machine learning model, similarly to the method described in Ribeiro.
Das et al. “Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey” discloses a variety of techniques used in the field of Explainable AI, including local approaches, perturbation-based approaches, and post-hoc approaches.
Bhoi et al. (Australian Patent Application No. AU 2020103212) discloses the use of a machine learning model to operate an electrical distribution network.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VICTOR A NAULT whose telephone number is (703) 756-5745. The examiner can normally be reached M - F, 12 - 8.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.A.N./Examiner, Art Unit 2124                                                                                                                                                                                                        

/Kevin W Figueroa/Primary Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Jul 21, 2021
Application Filed
Oct 30, 2024
Non-Final Rejection — §101, §103
Feb 05, 2025
Response Filed
Mar 04, 2025
Applicant Interview (Telephonic)
Mar 04, 2025
Examiner Interview Summary
Apr 21, 2025
Final Rejection — §101, §103
Jul 24, 2025
Request for Continued Examination
Jul 27, 2025
Response after Non-Final Action
Aug 04, 2025
Applicant Interview (Telephonic)
Aug 04, 2025
Examiner Interview Summary
Aug 28, 2025
Non-Final Rejection — §101, §103
Dec 04, 2025
Applicant Interview (Telephonic)
Dec 04, 2025
Examiner Interview Summary
Dec 04, 2025
Response Filed
Feb 18, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/571,899
Patent 12579429
DEEP LEARNING BASED EMAIL CLASSIFICATION
2y 5m to grant Granted Mar 17, 2026
17/663,579
Patent 12566953
AUTOMATED PROCESSING OF FEEDBACK DATA TO IDENTIFY REAL-TIME CHANGES
2y 5m to grant Granted Mar 03, 2026
17/730,413
Patent 12561563
AUTOMATED PROCESSING OF FEEDBACK DATA TO IDENTIFY REAL-TIME CHANGES
2y 5m to grant Granted Feb 24, 2026
17/517,313
Patent 12468939
OBJECT DISCOVERY USING AN AUTOENCODER
2y 5m to grant Granted Nov 11, 2025
17/578,759
Patent 12446600
TWO-STAGE SAMPLING FOR ACCELERATED DEFORMULATION GENERATION
2y 5m to grant Granted Oct 21, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+83.3%)
3y 11m
Median Time to Grant
High
PTA Risk
Based on 13 resolved cases by this examiner. Grant probability derived from career allow rate.
POST-HOC LOCAL EXPLANATIONS OF BLACK BOX SIMILARITY MODELS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email