Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-3, 5-11, 13-17 and 19-20 are pending. Claims 1, 7, and 15 are independent and are amended. Claims 4, 12, and 18 are canceled and the amendments to the independent Claims include the language of the canceled Claims with no change in scope.
This Application was published as U.S. 20250284728.
Apparent priority: 5 March 2024.
Applicant’s amendments and arguments are considered and are either unpersuasive or moot in view of the new grounds of rejection that where presented were necessitated by the amendments to the Claims.
This Action is Final.
Amendments and Arguments
Applicant’s arguments are directed to the amended language which does not represent any change in scope with respect to the previous Claim 4.
Applicant’s arguments are NOT persuasive as they are directly contradicted by the substance of the Dependent Claims.
Claim 1 is amended as follows and the other independent Claims are amended similarly:
1. A computer-implemented method comprising:
causing a target large language model (LLM) to generate, from a first input to the target LLM, a first output, the first input comprising natural language text input to the target LLM, the first output comprising natural language text output from the target LLM;
perturbing a portion of the first input, the perturbing resulting in a perturbed input, wherein a size of the portion is controlled by a perturbation size parameter and wherein perturbing the portion of the first input comprises replacing the portion with a replacement portion, the replacement portion generated by a replacement LLM;
causing the target LLM to generate a first perturbed output from the perturbed input;
scalarizing the first perturbed output, the scalarizing generating a scalar representing a difference between the first output and the first perturbed output;
aggregating, into an importance score corresponding to the portion, the scalar and a set of additional scalars, each additional scalar in the set of additional scalars representing a difference between the first output and an additional perturbed output, the additional perturbed output generated by the target LLM from an additional perturbation of the portion;
explaining, responsive to determining that the importance score is the highest importance score in a set of importance scores, the first output using the portion; and
training, using the portion and the importance score, an importance scoring model, the importance scoring model comprising an artificial neural network.
Amended Claims include material previously in Claim 4 which was rejected as follows:
Regarding Claim 4, Aggarwal teaches removing portions of the input but not replacing them with another portion.
Zhu teaches:
4. The computer-implemented method of claim 1,
wherein perturbing the portion of the first input comprises replacing the portion with a replacement portion, the replacement portion generated by a replacement LLM. [Zhu, the masking in Zhu is performed by “replacement”: “[0041] Masking is then used to identify datapoints in the training data D that are significant. For instance, in step 206 a datapoint(s) in the training data D is/are masked, i.e., masked datapoints are replaced with [MASK]. Next, it is determined whether or not the masking has had an impact on the decisions made by the machine learning model {circumflex over (θ)}….”]
Rationale as provided for Claim 1. Zhu is another reference which generates explainability based on changing/perturbing the input data and methods of perturbation of Zhu can be substituted for those of Aggarwal.
The Interview Summary of 12 March 2025 indicates:
PNG
media_image1.png
134
748
media_image1.png
Greyscale
The current amendments do not include any “further elaboration of the replacement LLM” and remain at the level of canceled Claim 4.
Applicant argues that the combination of Aggarwal and Zhu fails to teach or suggest the above language.
PNG
media_image2.png
144
704
media_image2.png
Greyscale
PNG
media_image3.png
132
720
media_image3.png
Greyscale
PNG
media_image4.png
122
706
media_image4.png
Greyscale
PNG
media_image5.png
118
706
media_image5.png
Greyscale
PNG
media_image6.png
74
710
media_image6.png
Greyscale
Response at 11.
In Reply: Aggarwal was not mapped to the particular language.
Zhu was applied to the particular language of causing perturbation by replacing a portion.
Zhu is titled: “Explaining Neural Models by Interpretable Sample-Based Interpretations” and is therefore is directed to explainable AI and in particular this is what Zhu claims to do: “[0007] The present invention provides sample-based model explanation techniques using arbitrary spans of training data at any granularity as an explanation with increased interpretability. … using the masking to explain which of the one or more datapoints in the training data D are significant. Namely, the one or more datapoints in the training data D that, when masked, change the decision of the machine learning model {circumflex over (θ)} are significant.”
The limitation at issue provides: “perturbing a portion of the first input, the perturbing resulting in a perturbed input, wherein a size of the portion is controlled by a perturbation size parameter and wherein perturbing the portion of the first input comprises replacing the portion with a replacement portion, the replacement portion generated by a replacement LLM.”
Each portion of the limitation is addressed below:
“perturbing a portion of the first input, the perturbing resulting in a perturbed input,”: Zhu teaches the perturbation of a portion by selecting “arbitrary spans of training data” like the ones shown in Figure 7 and making them. Masking removes the data and does teach the “perturbation” of the Claim. When a span is removed by masking and the meaning is not changed, that means that the span/datapoint/D was not important to the end result. This way the important inputs are identified. This is one definition of explainability: find out what inputs were key to the output and which inputs did not matter.
“wherein a size of the portion is controlled by a perturbation size parameter”: The size selected by Zhu appears to be sentence length and it also teaches: “[0037] As highlighted above, a desirable feature for an explanation method is interpretability. Both IF and TracIn methods use the entire training example as an explanation. Explanations with a finer-grained unit, e.g., phrases, may be easier to interpret in many applications where the texts are lengthy.”
“wherein perturbing the portion of the first input comprises replacing the portion with a replacement portion,”: masking of Zhu is replacing a portion with a mask/replacement portion. However, masking is also a type of removal and if the “replacement portion” of the Claim is interpretated as something that is not removal, then the mask of Zhu would not teach this portion. Note the following two Dependent Claims 2-3 (and their counterparts 10-11 and 16-17) that contradict the Applicant’s arguments:
PNG
media_image7.png
148
712
media_image7.png
Greyscale
As a matter of fact, according Claim 2, even the removal of Aggarwal teaches the perturbation of the Claim and adding the Mask of Zhu was unnecessary to the rejection.
“the replacement portion generated by a replacement LLM.”: This portion of the limitation is taught by Zhu in Figure 2 among many other places. A machine learning model is used to make the decisions regarding masking/perturbations and to determine which portions are important/key/significant.
Accordingly, Zhu teaches the limitation of previous Claim 4 and in fact Aggarwal’s removal of certain portions also teaches this limitation but the Examiner did not realize this at the time of making the rejection and is not permitted to modify her rejection now. However, please note that in view of definitions provided for “perturbation” in Claim 2 and counterparts, Aggarwal’s removal teaches the replacement portion of the Claim and the mask of Zhu was not necessary. Another issue is that while the rejection was conservative and added Zhu which refers to an LLM expressly, Aggarwal pertains to Language Models in the context of Machine Learning and arguably did include the LLM of the Claim.
To distinguish the Removal of Aggarwal and Masking of Zhu from the Perturbation of Claim 1, Applicant needs to cancel Claims 2-3, 10-11, and 16-17 and expressly exclude Removal and Masking from the perturbations of Claim 1 or define the Replacement Portions with particularity and in a way that excludes removal and masking. As is, there is express admission on the face of the Claims that Removal and Masking are types of perturbation.
Applicant additionally argues:
PNG
media_image8.png
128
708
media_image8.png
Greyscale
PNG
media_image9.png
124
720
media_image9.png
Greyscale
Response at 11-12.
In Reply, the “explanation or influence metrics” mentioned by the Applicant teach the “importance scoring” of the Claim. The subject of both references and the instant Claims is Explainability and the goal is to find out a metric/measure/score that shows how important/influential a particular portion of the input was. Thus, the arguments of the Applicant in this respect are not persuasive.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5, 7-11, 13, 15-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal (U.S20210358478) in view of Zhu (U.S. 20220383096).
Regarding Claim 1, Aggarwal teaches:
1. A computer-implemented method comprising: [Aggarwal, “A method of determining influence of language elements in script to an overall classification of the script by perturbing the dataset representing a conversation….” Abstract.]
causing a target large language model (LLM) to generate, from a first input to the target LLM, a first output, the first input comprising natural language text input to the target LLM, the first output comprising natural language text output from the target LLM; [Aggarwal, the inputs are sentences shown in Figures 1 and 2 and the model is a classifier which generates the significance of a subset of the input sentence / “first input” that is significant to the meaning of the input sentence: “[0017] A method for visualizing a classifier's decisions using a linear programming approximation (LPA) approach is provided. Given a textual input, the classifier returns a prediction probability for the class….”] [The classifier is an LPA and not an LLM and the output is not natural language text. But [0024] of Aggarwal lists a number of alternatives that include LLMs and shows the equivalency of its method with those using LLMs.]
perturbing a portion of the first input, the perturbing resulting in a perturbed input, [Aggarwal precisely performs this perturbation step on portions/subsets of the input sentence: “[0017] … By perturbing the textual input, the importance of certain elements of the text can be determined by analyzing how much the prediction probabilities vary. ….” “[0018] Such sentiment analysis provides an illustration of which phrases, clauses or sentences lead to the outcome, e.g., positive or negative. Phrases, clauses or sentences, e.g., sentiment analysis, are not the only delineation.”] wherein a size of the portion is controlled by a perturbation size parameter [Aggarwal, the type/size of portion is determined according to the goal of the classifier and it could be a sentence or a turn of speech which includes all that was said by a caller: “[0017] … The review can be perturbed (by removing one or more sentences) and seeing how probabilities for the classes of positive or negative review change. If a sentence such as “I hate this movie” is removed from the review, and the prediction probability for a negative review drastically drops, it can be determined that the influence of that particular sentence is great. …” “[0019] As another example, suppose a classifier is trained to determine if and when a conversation between a user and a chatbot should be escalated. … In this case, the conversation can be perturbed to determine the importances of user turns in the conversation, instead of, e.g., sentences….”] and wherein perturbing the portion of the first input comprises replacing the portion with a replacement portion, the replacement portion generated by a replacement LLM;
causing the target LLM to generate a first perturbed output from the perturbed input; [Aggarwal, the goal is to perturb the input and generate an output from the model. It teaches the use of DeepLIFT and LIME in [0024] as background prior art and then moves to its method of LPA. ]
scalarizing the first perturbed output, the scalarizing generating a scalar representing a difference between the first output and the first perturbed output; [Aggarwal uses Linear Programming Approximation (LPA) which resembles a least-squares optimization problems which uses a difference between expected and actual outputs of a model. In the formulation below, the probability of escalation indicates the importance of the perturbation and “generating a scalar representing a difference between the first output and the first perturbed output”:
PNG
media_image10.png
136
580
media_image10.png
Greyscale
PNG
media_image11.png
116
586
media_image11.png
Greyscale
Aggarwal also uses the alternative method of least squares which is based on a different too:
PNG
media_image12.png
134
576
media_image12.png
Greyscale
…
PNG
media_image13.png
258
580
media_image13.png
Greyscale
]
aggregating, into an importance score corresponding to the portion, the scalar and a set of additional scalars, each additional scalar in the set of additional scalars representing a difference between the first output and an additional perturbed output, the additional perturbed output generated by the target LLM from an additional perturbation of the portion; [Aggarwal, the “aggregating” of the Claim is taught by the summations
PNG
media_image14.png
54
32
media_image14.png
Greyscale
in [0034] and [0036] above. “[0016] … By carefully perturbing the input, classifier prediction probabilities are explained by creating multiple linear programming approximations (LPAs). The solutions of these LPAs determine the importance of textual elements (such as sentences in a document or turns in a conversation) pertaining to the classification task at hand….”]
explaining, responsive to determining that the importance score is the highest importance score in a set of importance scores, the first output using the portion; and [Aggarwal, Figures 1, 2, and 3. All of the figures of some type of perturbation. Figure 1 is about removing phrases or sentences from a piece of review as the perturbation And shows the import of each of those phrases or sentences 130. Figure 2 shows the importance of each turn of speech 230. “[0017] … The sentences 110 of the review 120 could then be highlighted by color intensity for their importances 130 in the classifier's decision, as illustrated in FIG. 1. FIG. 1 illustrates the influence of each sentence in a movie review on a classifier's decision on whether the review is positive (orange) 132 or negative (blue) 134. More influential sentences are darker in color. FIG. 3 illustrates a graphical user interface 300 showing a table illustrating a visual representation 340 according to principles described herein.” “[0062] … FIG. 2 illustrates influence of each user turn 220 in a conversation on a classifier's decision (importance 230) on whether the conversation should escalate (orange) 232 or not escalate (blue) 234. More influential turns are darker in color.”]
training, using the portion and the importance score, an importance scoring model, the importance scoring model comprising an artificial neural network. [Aggarwal, “2. The method of claim 1, further comprising: training a language model according to the perturbations.” The language model is the classifier that is trained; “[0017] … For example, suppose a classifier is trained to determine if a movie review is positive or negative. The review can be perturbed (by removing one or more sentences) and seeing how probabilities for the classes of positive or negative review change….” “[0019] As another example, suppose a classifier is trained to determine if and when a conversation between a user and a chatbot should be escalated. …”]
Aggarwal teaches a number of deep learning models and an LSTM as methods that can be used for interpreting black box models. “[0024] Many open questions still exist, not just in developing new techniques for interpreting black box models, but also in adapting existing methods to work with various models and structures of data. For example, DeepLIFT assigns importance scores to inputs by comparing neuron activations to a reference activation that must be user chosen. DeepLIFT is not yet applicable to RNNs. Layerwise relevance propagation, inspired by backpropagation, can determine which features in an input vector contribute the most to a network's output, but was only very recently adapted to textual input and even more recently to LSTMs. Another method is attention, inspired by the principle that animals often focus on specific parts of visual input to determine adequate responses. Bandanau proposed a neural translation model using attention, translating text from one language to another. Attention can be used to highlight important words and sentences in text. Attention, however, like the other previously mentioned methods above, is not model-agnostic. LIME is model-agnostic, relying solely on the input data and classifier prediction probabilities. By perturbing the input and seeing how predictions change, one can approximate the complex model using a simpler, interpretable linear model. However, users must consider how the perturbations are created, which simple model to train, and what features to use in the simpler model. Anchors (by the same authors of LIME) is also model-agnostic, but instead of highlighting elements of text, creates “if-then” rules that apply.” But, Aggarwal uses Linear Programming Approximation.
Aggarwal teaches removing portions of the input but not replacing them with another portion.
Zhu is also directed to “explanation techniques” and uses BERT which is an LLM and teaches:
causing a target large language model (LLM) to generate, from a first input to the target LLM, a first output, the first input comprising natural language text input to the target LLM, the first output comprising natural language text output from the target LLM; [Zhu, title: “Explaining Neural Models By Interpretable Sample-Based Explanations.” “[0003] Among the vast number of existing techniques for explaining machine learning models, Influence Functions (IF) that use training examples as explanations for model decisions (i.e., sample-based model explainability methods) have recently gained popularity in natural language processing. …. Influence Functions have been applied to explain Bidirectional Encoder Representations from Transformers (BERT)-based text classification and natural language inference models, as well as to aid text generation for data augmentation….” “[0067] … Since the present techniques use BERT family as the base model, the embedding of a training span is obtained by the difference of x and its span-masked version …” The explanation which is output is a span of the input and both are text as shown in Figure 7 and 11. “[0004] However, while useful, Influence Functions may not be entirely sufficient for natural language processing applications. For instance, the majority of existing works use entire training instances as explanations. However, for long natural language texts that are common in many high-impact application domains such as healthcare or finance, it may be difficult, if not impossible, to comprehend an entire instance as an explanation. For example, a model decision may depend only on a specific part of a long training instance.” “[0007] The present invention provides sample-based model explanation techniques using arbitrary spans of training data at any granularity as an explanation with increased interpretability. … using the masking to explain which of the one or more datapoints in the training data D are significant. Namely, the one or more datapoints in the training data D that, when masked, change the decision of the machine learning model {circumflex over (θ)} are significant.” The training data are explanation texts and the output of Zhu pinpoints the location of importance in the explanation.]
perturbing a portion of the first input, the perturbing resulting in a perturbed input, [Zhu, training spans like those shown in Figure 7 teach the potions that are perturbed/masked.] wherein a size of the portion is controlled by a perturbation size parameter [Zhu, perturbation size is taught by span lengths that are shown in Figure 7 as sentences and if they want finer grained explainability they use phrases. See [0037].] and wherein perturbing the portion of the first input comprises replacing the portion with a replacement portion, the replacement portion generated by a replacement LLM; [Zhu, the masking in Zhu is performed by “replacement”: “[0041] Masking is then used to identify datapoints in the training data D that are significant. For instance, in step 206 a datapoint(s) in the training data D is/are masked, i.e., masked datapoints are replaced with [MASK]. Next, it is determined whether or not the masking has had an impact on the decisions made by the machine learning model {circumflex over (θ)}….”]
Aggarwal and Zhu pertain to achieving explainability for models and it would have been obvious to use the BERT/LLM of Zhu in place of the LPA of Aggarwal to achieve explainability according to perturbations of spans of input text as an equivalent method well-suited to textual input and as evidenced by [0024] of Aggarwal that teaches LSTM and NNs as prior art methods thus showing the substitutability of these methods with the LPA of Aggarwal. This combination falls under simple substitution of one known element for another to obtain predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Regarding Claim 2, Aggarwal teaches:
2. The computer-implemented method of claim 1,
wherein perturbing the portion of the first input comprises removing the portion from the first input. [Aggarwal, “[0005] In accordance with the purpose(s) of this invention, as embodied and broadly described herein, this invention, in one aspect, relates to a method comprising performing a perturbation of the language elements by removing a subset of the language elements; analyzing the resulting classification to determine if the removed subset of the language elements causes a change in classifier outcome; and reporting that the subset is important to the classification if there is a change in the classifier outcome.”]
Regarding Claim 3, Aggarwal teaches removing portions of the input and one way of removing portions is by use of a mask. But does not mention the word “mask.”
Zhu teaches:
3. The computer-implemented method of claim 1,
wherein perturbing the portion of the first input comprises replacing the portion with a mask token. [Zhu is directed to generating model explainability and used the technique of masking as the method of perturbation. Figure 3, “mask the training span xij 304.” “[0028] As will be described in detail below, machine learning model {circumflex over (θ)} is first trained using training data D, after which masking of a datapoint(s) in a span of the training data (also referred to herein as a training span xij) is performed to determine an importance of the span, namely the influence of training span xij on a test example z′. For instance, it can be evaluated whether a new decision of the machine learning model {circumflex over (θ)} obtained after the masking of the training span xij is the same as a decision of the machine learning model {circumflex over (θ)} obtained prior to the masking. Further, the influence of the training span xij on a test span x′kl can also be measured, as opposed to the entire test sequence.”]
Rationale as provided for Claim 1. Zhu is another reference which generates explainability based on changing/perturbing the input data and methods of perturbation of Zhu can be substituted for those of Aggarwal.
Regarding Claim 5, Aggarwal teaches:
5. The computer-implemented method of claim 1,
wherein aggregating, into the importance score corresponding to the portion, the scalar and the set of additional scalars comprises estimating, using a value of the perturbation size parameter, a linear relationship between members of a set comprising the scalar and the set of additional scalars. [Aggarwal, see [0034] and [0036] cited and provided above with their equations that show that the perturbation P is a parameter and the method is a Linear Programming Approximation as reflected in the linear equations shown in those paragraphs. Also, the constraints include the number of “cumulative perturbations.” “[0051] (1) Cumulative perturbations. For our example, we would include (1) A, (2) A, B, (3) A, B, B, and the original conversation (4) A, B, B, D.” This is one of constraints CON2 added to the LPA and teaches the “value of the perturbation size parameter” of the claim: “[0050] Let us now examine how many and which constraints from CON2 should be added to the LPA for a better approximation. Recall that adding all constraints can be computationally very expensive and may even make the problem infeasible. Thus, an efficient algorithm to create perturbations is provided that, for the purposes addressed herein, produces constraints with a reasonably good approximation….”] (See also the definition of LP from Wikipedia as background (not cited as prior art) in the Conclusion.)
Claim 7 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
7. A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations comprising:
(Note: [0035] … A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media. …”)
Regarding Claim 8, Aggarwal teaches:
8. The computer program product of claim 7,
wherein the stored program instructions are stored in a computer readable storage device in a data processing system, and [Aggarwal, “0006] In another aspect, the invention relates to computer readable medium storing instructions for performing a perturbation of the language elements by removing a subset of the language elements; analyzing the resulting classification to determine if the removed subset of the language elements causes a change in classifier outcome; and reporting that the subset is important to the classification if there is a change in the classifier outcome.”]
wherein the stored program instructions are transferred over a network from a remote data processing system.
Aggarwal pertains to analyzing chat data which requires a network and remote devices but this is not express.
Zhu teaches:
wherein the stored program instructions are transferred over a network from a remote data processing system. [Zhu. Figures 12 and 13 show the setup of the various computers in communication over a network and Figure 14 shows the different layers of processing. “[0097] Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service. This cloud model may include at least five characteristics, at least three service models, and at least four deployment models.”]
Aggarwal and Zhu pertain to explainability determinations and when the resources used are data and processing power intensive it does make sense to use a centralized location for the processing and receive the results. This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Claim 10 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.
Claim 11 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.
Claim 13 is a computer program product system claim with limitations corresponding to the limitations of method Claim 5 and is rejected under similar rationale.
Claim 15 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
15. A computer system comprising a processor and one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by the processor to cause the processor to perform operations comprising:
Claim 16 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 17 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.
Claim 19 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claims 6, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal and Zhu and further in view of Ruggero (U.S. 20230281362).
Regarding Claim 6, Aggarwal teaches:
6. The computer-implemented method of claim 1,
wherein aggregating, into the importance score corresponding to the portion, the scalar and the set of additional scalars comprises computing a weighted average of differences between members of a set comprising the scalar and the set of additional scalars. [Aggarwal, see [0034] and [0036] cited and provided above with their equations that show a “weighted difference.”]
Aggarwal minimizes a weighted difference and does not compute an average.
Zhu has a SAG measure ([0067]) which is based on an average of distances/differences but a references that uses linear programming like Aggarwal is used and combined.
Ruggero teaches:
wherein aggregating, into the importance score corresponding to the portion, the scalar and the set of additional scalars comprises computing a weighted average of differences between members of a set comprising the scalar and the set of additional scalars. [Ruggero pertains to the use of Linear Programming for optimizing parameters of a model and Figure 8, 840 teaches that the loss function is a weighted average of differences that is optimized by linear programming: “[0167] At block 840, loss detector 735 calculates a loss function using the predicted results and known results. The independent variables of the loss function can be learnable parameters, and the dependent variable of the loss function can be a loss indicating a difference between predicted and actual results. Thus, the loss function can be used to estimate how well a model configured with current values for the learnable parameters is performing relative to a performance associated with different parameter values. In some instances, a predicted result includes multiple predicted results (corresponding to different outputs of the model), and the loss can be defined to be a weighted average based on the difference between the multiple predicted results and the corresponding actual results.” “[0074] One or more modules may be configured to use linear programming 545 to identify a set of compound quantities that correspond to balancing fluxes identified in reactions represented in the stoichiometry matrix….”]
Aggarwal/Zhu and Ruggero pertain to use of linear programming optimization methods for the same purpose of determining explainability for a model by perturbing inputs to the model and it would have been obvious to include the express teaching of the use of weighted average of differences in the optimization formula from Ruggero as part of the linear programming optimization of the combination. This combination falls under combining prior art elements according to known methods to yield predictable results or simple substitution of one known element for another to obtain predictable results. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Claim 14 is a computer program product system claim with limitations corresponding to the limitations of method Claim 6 and is rejected under similar rationale.
Claim 20 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal and Zhu and further in view of Barsness (U.S. 9459757).
Regarding Claim 9, Aggarwal does not teach metering the data and charging for it. Neither does Zhu.
Barsness teaches:
9. The computer program product of claim 7, wherein the stored program instructions are stored in a computer readable storage device in a server data processing system, and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, [Barsness is directed to placement of the processing tool in a network. Figure 1 shows the different computing nodes in communication over a network 120 with each node having a different role such as management, processing, or development. “A method, system, and computer program product for selectively associating one or more processing elements, or portions thereof, to one or more compute nodes. The method, system, and computer program product can include presenting a stream computing application, presenting metrics associated with at least one compute node and at least one processing element (or portion thereof), receiving input from a graphical display, associating one or more processing elements (or portions thereof) to one or more compute nodes, and updating the graphical display and the metrics to reflect the selective associations.” Abstract. “Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.” 11:52-57.]
further comprising:
program instructions to meter use of the program instructions associated with the request; and [Barsness, “Embodiments of the present invention may also be delivered as part of a service engagement with a client corporation, nonprofit organization, government entity, internal organizational structure, or the like. These embodiments may include configuring a computer system to perform, and deploying software, hardware, and web services that implement, some or all of the methods described herein. These embodiments may also include analyzing the client's operations, creating recommendations responsive to the analysis, building systems that implement portions of the recommendations, integrating the systems into existing processes and infrastructure, metering use of the systems, allocating expenses to users of the systems, and billing for use of the systems.” 13:15-25.]
program instructions to generate an invoice based on the metered use. [Barsness, meters the use to generate an invoice for the resources used. “1…. ; metering the use of the graphical user interface display; and generating an invoice based on the metered use.”]
Aggarwal/Zhu and Barsness pertain to use of networked resources and it would have been obvious to add the metering and invoicing of Barnsess that is directed to purchase and use of distributed computing resources with the system of combination which includes distributed computing configurations to enable it to use external resources available for a fee. This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Ratner (U.S. 20200372309): “An example system includes a processor to receive an input and a model trained to classify inputs. The processor is to iteratively generate a perturbed input that optimizes a saliency metric including a classification term, a sparsity term, and a smoothness term, while keeping parameters of the model constant. The processor is to also detect that a predefined number of iterations is exceeded or a convergence of values of the perturbed input. The processor is to further generate a saliency mask based on a perturbation of the perturbed input in response to detecting the predefined number of iterations is exceeded or the convergence.” Abstract.
XAI and Counterfactuals with LIME, SHAP, Deeplift
Published May 24, 2024
PNG
media_image15.png
560
728
media_image15.png
Greyscale
PNG
media_image16.png
472
750
media_image16.png
Greyscale
PNG
media_image17.png
276
748
media_image17.png
Greyscale
PNG
media_image18.png
348
716
media_image18.png
Greyscale
4. Algorithms for Generating Counterfactual Explanations:
- Optimization-based methods: Frame the generation of counterfactual explanations as an optimization problem.
- Perturbation-based methods: Generate counterfactual explanations by iteratively perturbing input features.
- Case-based reasoning: Find similar instances in the training data that meet the desired outcome.
- Generative models: Create realistic counterfactual instances by learning the underlying data distribution.
- Each approach offers distinct advantages and implementation considerations.
Conclusion:
- XAI methods play a vital role in enhancing trust, transparency, and interpretability in machine learning models.
- LIME, DeepLIFT, and other XAI models offer valuable insights into model behavior.
- Algorithms for generating counterfactual explanations provide actionable insights into individual predictions.
- Leveraging these techniques enables practitioners to better understand, interpret, and trust complex machine learning models.
More: https://neptune.ai/blog/explainability-auditability-ml-definitions-techniques-tools
Linear programming - Wikipedia
Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements and objective are represented by linear relationships. Linear programming is a special case of mathematical programming (also known as mathematical optimization).
More formally, linear programming is a technique for the optimization of a linear objective function, subject to linear equality and linear inequality constraints. Its feasible region is a convex polytope, which is a set defined as the intersection of finitely many half spaces, each of which is defined by a linear inequality. Its objective function is a real-valued affine (linear) function defined on this polytope. A linear programming algorithm finds a point in the polytope where this function has the largest (or smallest) value if such a point exists.
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Fariba Sirjani/
Primary Examiner, Art Unit 2659