Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
This action is in response to the amendments filed 08/29/2025. Claims 1, 9-13, and 15-17 have been amended, claims 1-5, 7-13, and 15-20 are currently pending.
Response to Arguments
Applicant’s arguments regarding the 101 rejection have been fully considered but they are not persuasive. Applicant argues “at least since the presently amended claims are directed to identifying, using the optimum counterfactual explanation, a different sequence of input for the plurality of inputs to change an output of the AI regression model for sequentially identifying boundaries of the AI regression model, such that more accurate boundaries of the AI regression model may be identified. . .” that the claims recite a “real-world practical application of technology” that provides a technological improvement. Examiner respectfully disagrees and notes that Applicant’s alleged improvement is directed to “identifying” a different sequence of inputs to change an output of an AI regression model using an optimum counterfactual explanation, which was interpreted as a judicial exception directed to a mental step. The claims do not distinguish the way the claimed invention makes this identification from the way a person could identify this sequence in their mind. Section 2106.05(a) states “the judicial exception alone cannot provide the improvement” but notes that improvements can be provided by additional elements, potentially in combination with judicial exceptions; however, Applicant has not explained which additional elements provide the technological improvement that would integrate the claimed judicial exceptions into a practical application. Therefore, the 101 rejection is maintained and has been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.
Applicant’s arguments regarding the prior art rejection have been fully considered but are moot because of the new ground(s) of rejection. Applicant argues that the prior art, particularly the Karimi reference, fails to teach “identifying, by at least the processor and using the optimum counterfactual explanation, a different sequence of input for the plurality of inputs to change an output of the AI regression model for sequentially identifying boundaries of the AI regression model”. Examiner respectfully disagrees and notes that the phrase “for sequentially identifying boundaries of the AI regression model” is interpreted as the intended use or necessary result of identifying a different input sequence and does not provide additional patentable weight to this limitation. Examiner also notes that at least section I of Karimi teaches that different inputs associated with different counterfactuals can be determined in order to determine which inputs can change the output of a model and can identify an optimum counterfactual “such that no counterfactual provably exists at a smaller distance” and can satisfy any constraints. Lastly, Examiner notes that the McGrath reference has been brought in to teach limitations directed to generating counterfactuals based on inputs received from user device, and that the Mahajan reference is no longer relied upon. The prior art rejections have been updated to include the amended limitations and to clarify the reasoning given for the limitations that were not amended.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-5, 7-13, and 15-20 are rejected under 35 U.S.C. 101. Claims 1-5 and 7-8 are directed to a method, claims 9-13 and 15-16 are directed to a system, and claims 17-20 are directed to non-transitory computer readable storage medium; therefore, claims 1-20 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). However, claims 1-20 fall within the judicial exception of an abstract idea, specifically the abstract ideas of “Mental Processes” (including observation, evaluation, and opinion) and “Mathematical Concepts (including mathematical calculations and relationships)”.
Claim 1:
Claim 1 is directed to a method; therefore, the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Claim 1 recites the following abstract ideas:
defining, based on a plurality of exponential polynomial functions of the input value designated by the client device, a set of candidate counterfactual potential functions (defining a set of candidate counterfactual potential functions based on a plurality of exponential-polynomial functions of a user designated input value could be interpreted as a mental step directed to observation, evaluation – a person could define a set of candidate counterfactual potential functions in their mind based on observed user-designated exponential-polynomial functions. The broadest reasonable interpretation of defining a set of candidate counterfactual potential functions based on a plurality of exponential-polynomial functions also includes interpretation as a mathematical relationship in light of at least paragraphs [0106]-[0107] of Applicant’s specification);
optimizing the defined set of candidate counterfactual potential functions with respect to the input value designated by the client device for generating an optimum counterfactual explanation among a plurality of counterfactual explanations (the broadest reasonable interpretation of optimizing a set of counterfactual potential functions includes interpretation as a mathematical calculation in light of at least paragraphs [0090]-[0092] of Applicant’s specification. Examiner notes that optimizing a set of candidate counterfactual potential functions could also be interpreted as a mental step directed to evaluating the potential functions with respect to an observed input. Examiner also notes that wherein the input is received “for generating a counterfactual explanation among a plurality of counterfactual explanations” is interpreted as the intended use or outcome of receiving the input and optimizing the potential functions),
wherein the optimum counterfactual explanation corresponds to a neighboring data point in an input space as the query instance, but results in a different output to the query instance, wherein the optimum counterfactual explanation corresponds to a result of performing a differential continuous mapping between output values of the Al regression model and a real line over a predetermined set of real numbers, wherein the optimum counterfactual explanation corresponds to maximum value of one of the set of candidate counterfactual potential functions for the designated input value received from the client device (mental step directed to evaluation, judgement – a person could determine an optimum counterfactual explanation corresponding to a neighboring data point with a different output from an observed query instance in their mind; perform a differential continuous mapping between observed output values of the AI regression model and a real lines over a predetermined set of real numbers in their mind, potentially assisted by pen and paper; and determine an optimum counterfactual potential function corresponding to a maximum value of a candidate counterfactual potential function for an observed designated client input value in their mind);
and wherein the optimizing includes: performing a differentiable continuous mapping between a respective output value of the AI regression model and a real line over a predetermined subset of real numbers so that the input corresponds to a maximum value among the set of candidate counterfactual potential functions (the broadest reasonable interpretation of performing a differential continuous mapping between output values of an AI regression model and a real line over a predetermined subset of real numbers so that the input corresponds to a maximum value among the set of candidate counterfactual potential functions includes interpretation as a mathematical calculation in light of at least paragraph [0106] of Applicant’s specification. Examiner notes that “so that the input corresponds to a maximum value among the set of candidate counterfactual potential functions” is interpreted as the intended use or necessary result of performing the differentiable continuous mapping),
setting, among the set of candidate counterfactual potential functions, a candidate counterfactual potential function corresponding to the maximum value as a counter factual explanation (mental step directed to evaluation, judgement – a person could set a candidate counterfactual potential function corresponding a maximum value as a counterfactual explanation in their mind. Examiner notes that setting this function corresponding to a maximum value as the explanation could also be interpreted as the output of the differentiable continuous mapping in the previous limitation that could be interpreted as a mathematical calculation);
bounding an output of the AI regression model having a continuous output using the set counterfactual explanation (the broadest reasonable interpretation of bounding an output of the model using the set counterfactual explanation includes interpretation as a mathematical relationship determining the acceptable range for an output of the model in light of at least paragraph [0106] of Applicant’s specification. Examiner also notes that bounding an output could be interpreted as a mental step, as a person could use a determined or observed set counterfactual explanation to determine the acceptable boundaries of an output of the model in their mind), wherein the bounding includes:
inspecting the output of the AI regression model using the set counterfactual explanation (mental step directed to observation, evaluation – a person could inspect in their mind an observed output of an AI regression model using an observed or determined counterfactual explanation);
wherein the counterfactual explanation corresponds to an explanation of the AI regression model, which is expressed based on respective distances from the query instance which results in different model outputs than a model output for the query instance itself (the broadest reasonable interpretation of generating a counterfactual explanation based on a value of an obtained mathematical expression corresponding to a maximum value of a counterfactual potential function includes interpretation as a mathematical calculation as well as a mental step directed to observation and evaluation, as a person could generate a counterfactual explanation in their mind corresponding to a maximum value of an observed candidate counterfactual potential function and based on at least one observed value of an obtained mathematical expression. Examiner notes that performing the differential continuous mapping between output values of the AI regression model and a real line does not require use of the AI regression model itself, only the output values from the model. Wherein the counterfactual explanation is expressed based on respective distances from a query instance different than the actual model output is interpreted as the intended use or intended result of generating counterfactual explanation);
wherein the AI regression model adapts an iterative scheme, such that a subsequent input, among the plurality of inputs, is selectively fed to the AI regression model as a candidate counterfactual is chosen by maximizing an expected improvement of at least one of the candidate counterfactual potential functions (maximizing an expected improvement of a candidate counterfactual potential function to choose a subsequent input to an AI regression model could be interpreted as a mental step directed to evaluation – as a person could maximize an expected improvement associated with an observed candidate counterfactual potential function in their mind. The broadest reasonable interpretation of maximizing an expected improvement of a candidate counterfactual potential function also includes interpretation as a mathematical equation in light of paragraphs [0118] – [0123] of Applicant’s specification);
and wherein the expected improvement is determined to be maximized when a convergence point is identified for limiting the number of iterations to be performed to provide a more efficient search of the optimum counterfactual explanation (mental step directed to observation, evaluation – a person could determine, in their mind, that a calculated or determined expected improvement is maximized by identifying, in their mind, a convergence point that limits the number of iterations to be performed. Examiner notes that “to provide a more efficient search of the optimum counterfactual explanation” is interpreted as the intended use or outcome of identifying a convergence point and does not further limit the claim)
identifying, using the optimum counterfactual explanation, a different sequence of input for the plurality of inputs to change an output of the Al regression model for sequentially identifying boundaries of the Al regression model (mental step directed to observation, evaluation – a person could identify a different sequence of inputs to change an output of an AI regression model using an observed or determined optimum counterfactual explanation. Examiner notes that “for sequentially identifying boundaries of the Al regression model” is interpreted as the intended use or outcome of identifying the difference sequence of inputs and does not further limit the claim).
Claim 1 recites the following additional elements:
at least one processor, a distributed network, first and second servers, a client device, a network interface; obtaining, by the at least one processor and via a distributed network, the Al regression model from a first server and a first value that corresponds to a query instance from a client device via a network interface, wherein the AI regression model is configured to have a continuous output and executed over a number of iterations until a candidate counterfactual potential function of the AI regression model is maximized; receiving, by the AI regression model and from a second server, a plurality of inputs corresponding to a list of features pertaining to a dataset for generating an output corresponding to a predicted probability for the data set; receiving, by the at least one processor from the client device, an input value designated by the client device for generating the counterfactual explanation, the input value being different from the obtained query instance; and providing the optimum counterfactual explanation that indicates how a data input should be minimally different for the AI regression model to change the output from that of the query instance.
The processor, a distributed network, first and second servers, a client device, a network interface are interpreted as generic computer components used to implement the claimed abstract ideas. Obtaining an AI regression and a first query instance value from a first server are interpreted as receiving data over a network. Wherein the AI regression model is configured to have a continuous output and execute over a number of iterations to maximize a candidate counterfactual potential function is interpreted as intended use or intended result of the obtained AI regression model. Receiving a plurality of inputs corresponding to a list of features by an AI regression model and from a second server is interpreted as receiving data over a network. Wherein the received features pertaining to a dataset are for generated an output corresponding to a predicted probability for a dataset is interpreted as the intended use or intended result of receiving the list of features. Receiving, by the at least one processor from the client device, an input value designated by the client device for generating the counterfactual explanation, the input value being different from the obtained query instance is interpreted as receiving data over a network. Providing the optimum counterfactual explanation that indicates how a data input should be minimally different for the AI regression model to change the output from that of the query instance as transmitting data over a network. These elements do not integrate the abstract idea into a practical application or amount to significantly more than the abstract idea (see MPEP 2106.05(d)).
Claim 9 is a system claim and its limitation is included in claim 1. The only difference is that claim 9 requires a system, which is interpreted as generic computer components used to merely apply the claimed abstract ideas identified in the analysis of claim 1 (see MPEP 2106.05(f)). Therefore, claim 9 is rejected for the same reasons as claim 1.
Claim 17 is a non-transitory computer readable storage medium claim and its limitation is included in claim 1. The only difference is that claim 17 requires a non-transitory computer readable storage medium, which is interpreted as generic computer components used to merely apply the claimed abstract ideas identified in the analysis of claim 1 (see MPEP 2106.05(f)). Therefore, claim 17 is rejected for the same reasons as claim 1.
The independent claims are not patent eligible.
Dependent claims 2-5, 7-8, 10-13, 15-16, and 18-20 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitations fail to establish that the claims are not directed to an abstract idea, as they recite further embellishment of the judicial exception.
Claim 2 recites wherein the generating of the counterfactual explanation further comprises: selecting, from among a predetermined set of possible input data points, a first data point to be used as an input to the obtained AI regression model; computing a corresponding output value of the obtained AI regression model based on the selected first data point; determining whether the computed corresponding output value corresponds to a predetermined optimum potential value; and when the computed corresponding output value is determined as corresponding to the predetermined optimum potential value, generating the counterfactual explanation based on the selected first data point.
Selecting a first data point to be used as input to an AI regression model could be interpreted as an abstract idea directed to a mental step directed to observation and judgement, as a person could select an observed data point from a predetermined set of possible data points in their mind. Selecting a data point to be used as input from a predetermined set could also be interpreted as an additional element directed to selecting a particular type of data to be manipulated, which does not integrate the claimed abstract ideas into a practical application or amount to significantly more than the claimed abstract ideas (see MPEP 2106.05(g)). Computing a corresponding output value of an obtained AI regression model based on a selected data point is interpreted as an abstract idea directed to a mathematical calculation executed by an AI regression model which is interpreted as well-understood, routine conventional activity in light of Utsumi et al (US 20190370673 A1), paragraph [0071] of which recites “The well-known method is, for example, a method using linearity, such as a linear regression model of a multiple regression model, and a generalized linear model of a logistic regression, a method using autoregression such as autoregressive with exogenous (ARX) model, a method using a reduction estimator such as Ridge regression, Lasso regression or ElasticNet, a method using a dimension degenerator such as a partial least-squares method or principal component regression, or a nonparametric method of a nonlinear model using polynomials, support vector regression, a regression tree, Gaussian process regression, a neural network, or the like”.
Determining whether a computed output value corresponds to a predetermined optimum potential value is interpreted as an abstract idea directed to a mental step directed to evaluation, as a person could determine whether a computed output value corresponds to a predetermined optimum value in their mind. Generating a counterfactual explanation based on a selected data point could be interpreted as an abstract idea directed to a mathematical calculation as well as an abstract idea directed to a mental step directed to observation and evaluation, as a person could generate a counterfactual explanation in their mind based on an observed selected data point.
Claim 3 recites wherein when the computed corresponding output value is determined as not corresponding to the predetermined optimum potential value, the method further comprises: selecting a next data point from among the predetermined set of possible input data points to be used as an input to the obtained AI regression model; computing a next corresponding output value of the obtained AI regression model based on the selected next data point; determining whether the next computed corresponding output value corresponds to the predetermined optimum potential value; when the next computed corresponding output value is determined as corresponding to the predetermined optimum potential value, generating the counterfactual explanation based on the most recently selected next data point; and when the next computed corresponding output value is determined as not corresponding to the predetermined optimum potential value, repeating the selecting, computing, and determining steps for additional next data points until the computed corresponding value is determined as corresponding to the predetermined optimum potential value.
Selecting a next data point to be used as input to an AI regression model could be interpretated as an abstract idea directed to a mental step directed to observation and judgement, as a person could select an observed data point from a predetermined set of possible data points in their mind. Selecting a data point to be used as input from a predetermined set could also be interpreted as an additional element directed to selecting a particular type of data to be manipulated, which does not integrate the claimed abstract ideas into a practical application or amount to significantly more than the claimed abstract ideas (see MPEP 2106.05(g)). Computing a corresponding output value of an obtained AI regression model based on a selected data point is interpreted as an abstract idea directed to a mathematical calculation executed by an AI regression model which is interpreted as well-understood, routine conventional activity in light of Utsumi et al (US 20190370673 A1), paragraph [0071] of which recites “The well-known method is, for example, a method using linearity, such as a linear regression model of a multiple regression model, and a generalized linear model of a logistic regression, a method using autoregression such as autoregressive with exogenous (ARX) model, a method using a reduction estimator such as Ridge regression, Lasso regression or ElasticNet, a method using a dimension degenerator such as a partial least-squares method or principal component regression, or a nonparametric method of a nonlinear model using polynomials, support vector regression, a regression tree, Gaussian process regression, a neural network, or the like”.
Determining whether a computed output value corresponds to a predetermined optimum potential value is interpreted as a mental step directed to evaluation, as a person could determine whether a computed output value corresponds to a predetermined optimum value in their mind. Generating a counterfactual explanation based on a selected data point could be interpreted as an abstract idea directed to a mathematical calculation as well as an abstract idea directed to a mental step directed to observation and evaluation, as a person could generate a counterfactual explanation in their mind based on an observed selected data point. Wherein when the computed corresponding output value is determined as not corresponding to the predetermined optimum potential value, repeating the selecting, computing, and determining steps for additional next data points until the computed corresponding value is determined as corresponding to the predetermined optimum potential value is interpreted as repeating claimed abstract ideas as explained above.
Claim 4 recites receiving, by the at least one processor from a user, an input value designated by the user for generating the counterfactual explanation, wherein the defining of the set of candidate counterfactual potential functions comprises performing the differential continuous mapping between the respective output values of the obtained AI regression model and the real line over the predetermined subset of real numbers such that the input value designated by the user corresponds to the maximum value of the defined set of candidate counterfactual potential functions, and wherein the set of candidate counterfactual potential functions favor an output of the AI regression model that is a specified distance away from an output for the query instance, the specified distance corresponding to the input value designated by the user.
Receiving an input value from a user is interpreted as an additional element directed to sending and receiving data over a network, which does not integrate the claimed abstract ideas into a practical application or amount to significantly more than the claimed abstract ideas (see MPEP 2106.05(d)(II)). Performing the differential continuous mapping between the respective output values of the obtained mathematical expression and the real line over the predetermined subset of real numbers such that the input value designated by the user corresponds to the maximum value of the defined at least one candidate counterfactual potential function is interpreted as an abstract idea directed to a mathematical calculation. Wherein the set of candidate potential functions favor an output of the AI regression model that corresponds to a specified distance designated by the user is interpreted as the intended use or intended result of defining the set of candidate counterfactual potential functions.
Claim 5 recites wherein the defining of the set of candidate counterfactual potential functions further comprises: determining a plurality of candidate counterfactual potential functions; optimizing the determined plurality of candidate counterfactual potential functions with respect to the input value designated by the user; and generating the counterfactual explanation based on a result of the optimizing. Determining a set of candidate counterfactual potential functions, optimizing a plurality of candidate counterfactual potential functions, and generating a counterfactual explanation based on a result of the optimization are all interpreted as abstract ideas directed to mental steps, as a person could determine a set of candidate counterfactual potential functions, optimize a plurality of candidate counterfactual potential functions, and generate a counterfactual explanation in their mind.
Claim 7 recites wherein the determining of the plurality of candidate counterfactual potential functions comprises defining a plurality of exponential- polynomial functions of the input value designated by the user. Defining a plurality of exponential-polynomial functions of an input value to determine a plurality of candidate counterfactual potential functions is interpreted as an abstract idea directed to defining a mathematical relationship.
Claim 8 recites wherein the Al regression model includes at least one from among a neural network model, a logistic regression model, and a random forest model. Wherein a regression model includes at least one from among a neural network model, a logistic regression model, and a random forest is interpreted as well-understood, routine, conventional activity in the art in light of Utsumi et al (US 20190370673 A1), paragraph [0071] of which recites “The well-known method is, for example, a method using linearity, such as a linear regression model of a multiple regression model, and a generalized linear model of a logistic regression, a method using autoregression such as autoregressive with exogenous (ARX) model, a method using a reduction estimator such as Ridge regression, Lasso regression or ElasticNet, a method using a dimension degenerator such as a partial least-squares method or principal component regression, or a nonparametric method of a nonlinear model using polynomials, support vector regression, a regression tree, Gaussian process regression, a neural network, or the like”.
Claim 10 is a system claim and its limitation is included in claim 2. Claim 10 is rejected for the same reasons as claim 2.
Claim 11 is a system claim and its limitation is included in claim 3. Claim 11 is rejected for the same reasons as claim 3.
Claim 12 is a system claim and its limitation is included in claim 4. Claim 12 is rejected for the same reasons as claim 4.
Claim 13 is a system claim and its limitation is included in claim 5. Claim 13 is rejected for the same reasons as claim 5.
Claim 15 is a system claim and its limitation is included in claim 7. Claim 15 is rejected for the same reasons as claim 7.
Claim 16 is a system claim and its limitation is included in claim 8. Claim 16 is rejected for the same reasons as claim 8.
Claim 18 is a non-transitory computer readable storage medium claim and its limitation is included in claim 2. Claim 18 is rejected for the same reasons as claim 2.
Claim 19 is a non-transitory computer readable storage medium claim and its limitation is included in claim 3. Claim 19 is rejected for the same reasons as claim 3.
Claim 20 is a non-transitory computer readable storage medium claim and its limitation is included in claim 4. Claim 20 is rejected for the same reasons as claim 4.
Viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claims amount to significantly more than the abstract idea itself. Therefore, the claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 7-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Karimi et al* (“Model Agnostic Counterfactual Explanations for Consequential Decisions”, herein Karimi), in view of McGrath et al** (US 12254388 B2, herein McGrath), in further view of Snoek et al* (“Practical Bayesian Optimization of Machine Learning Algorithms”, herein Snoek), in further view of Zhu (“Machine Teaching for Bayesian Learners in the Exponential Family”, herein Zhu).
*a copy of this document was provided with the IDS dated 10/13/2021
** this document was cited in the Notice of References Cited dated 07/24/2024
Regarding claim 1, Karimi teaches a method for generating a counterfactual explanation for an artificial intelligence (Al) regression model, the method being implemented by at least one processor, the method comprising:
obtaining, by the at least one processor [and via a distributed network], the Al regression model [from a first server] and a first value that corresponds to a query instance [from a client device via a network interface] (section 3 para. 1 recites “This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors x ϵ X into decisions y ϵ {0, 1}. Given a predictive model f : X [Wingdings font/0xE0] {0, 1}, we can define the set of counterfactual explanations for a (factual) input xˆ ϵ X as CFf (xˆ) = {x ϵ X | f(x) ≠ f (xˆ)}. In words, CFf (xˆ) contains all the inputs x for which the model f returns a prediction different from f (xˆ)”. The footnote for section 3 para. 1 recites “While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where y ϵ R and more generally any other output domain” (i.e., obtaining a mathematical expression that corresponds to an AI regression model and a first value corresponding to a query instance. Examiner’s note: the broadest reasonable of “query instance” includes the factual input from Karimi in light of paragraph [0082] of Applicant’s specification, which defines a query instance as “a member of the input space whose corresponding model output is to be explained”)),
wherein the AI regression model is configured to have a continuous output and executed over a number of iterations [until a candidate counterfactual potential function of the AI regression model is maximized] (section 4.2 para. 6 recites “one might be interested in generating a (small) set of diverse counterfactual explanations for the same instance ˆx. To this end, we iteratively call Algorithm 1 with a constraints formula "φv that includes diversity clauses to ensure that the newly generated explanation is substantially different from all the previous ones”. Algorithm 1 depicts an iterative method that generates counterfactual candidates while distance δmax – distance δmin > accuracy ϵ (i.e., the model has a continuous output and executes over a number of iterations until a stopping condition is met));
receiving, by the AI regression model [and from a second server], a plurality of inputs corresponding to a list of features pertaining to a dataset for generating an output corresponding to a predicted probability for the dataset (section 3 para. 1 recites “This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors x ϵ X into decisions y ϵ {0, 1}. Given a predictive model f : X [Wingdings font/0xE0] {0, 1}, we can define the set of counterfactual explanations for a (factual) input xˆ ϵ X as CFf (xˆ) = {x ϵ X | f(x) ≠ f (xˆ)}. In words, CFf (xˆ) contains all the inputs x for which the model f returns a prediction different from f (xˆ)” (i.e., the model receives a plurality of inputs corresponding to a list of features in a dataset. Examiner notes that wherein the dataset is “for generating an output corresponding to the predicted probability for the dataset” is interpreted as the intended use or result of using the dataset and does not provide additional patentable weight to this limitation));
defining, by the at least one processor [and based on a plurality of exponential-polynomial functions of the input value designated by the client device], a set of candidate counterfactual potential functions (fig. 1 and section 1 para. 5 recite “in MACE (i.e., Model-Agnostic Counterfactual Explanations) we map the nearest counterfactual problem into a sequence of satisfiability (SAT) problems, by expressing both the predictive model and the distance function (as well as the plausibility and diversity constraints) as logic formulae. Each of these satisfiability problems aims to verify if there exists a counterfactual explanation at a distance smaller than a given threshold, and can be solved using standard SMT (satisfiability modulo theories) solvers”. Section 3 para. 2 recites “given a factual input xˆ with f(xˆ) = yˆ and φf we define the counterfactual formula as (EQ1). Intuitively, the formula on the right hand side of (EQ1) says that “x is a counterfactual for xˆ if either f(xˆ) = 0 and f(x) = 1, or f(xˆ) = 1 and f(x) = 0”. It is thus clear from the definition that an input x satisfies φCFf (xˆ) if and only if x ϵ CFf(xˆ)”. Section 4.1 recites “We remark here our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressible in a fairly general programming language. As a consequence, we can handle a wide variety of predictive models, including both differentiable – such as, logistic regression and multilayer perceptron – and non-differentiable predictive models e.g., decision trees and random forest– as well as a wide variety of distance functions” (i.e., determining a candidate counterfactual using a differential continuous mapping between output values of the mathematical expression and a real line over a subset of real numbers));
optimizing the defined set of candidate counterfactual potential functions with respect to the input value [designated by the client device] for generating an optimum counterfactual explanation among a plurality of counterfactual explanations (section 4.1 para. 1 recites “Our goal now is to leverage the representation of CFf (ˆx) (i.e., the counterfactual functions) in terms of a logic formula to solve (EQ2). To this end, we map the optimization problem in (EQ2) into a sequence of satisfiability problems, which can be verified or refuted by standard SMT solvers” (i.e., generating a counterfactual explanation by optimizing a candidate counterfactual potential function)),
wherein the optimum counterfactual explanation corresponds to a neighboring data point in an input space as the query instance, but results in a different output to the query instance, wherein the optimum counterfactual explanation corresponds to a result of performing a differential continuous mapping between output values of the Al regression model and a real line over a predetermined set of real numbers (section 3 para. 1 recites “This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors x ϵ X into decisions y ϵ {0, 1}2. Given a predictive model f : X [Wingdings font/0xE0] {0, 1}, we can define the set of counterfactual explanations for a (factual) input ˆx ϵ X as CFf(ˆx) = {x ϵ X | f(x) ≠ f(ˆx)}. In words, CFf(ˆx) contains all the inputs x for which the model f returns a prediction different from f(ˆx)”. Section 4 para. 1 recites “Based on the counterfactual space CFf(xˆ) defined in the previous section, we would like to produce counterfactual explanations for the output of a model f on a given input xˆ by trying to find a nearest counterfactual, which is defined as: (EQ2). For the time being, we assume that a notion of distance between instances, d, is given” (i.e., the optimum counterfactual explanation corresponds to a neighbor value to the input, but results in a different output and is based on a value obtained from the differential continuous mapping of the model));
and wherein the optimizing includes: performing a differentiable continuous mapping between a respective output value of the AI regression model and a real line over a predetermined subset of real numbers [so that the input corresponds to a maximum value among the set of candidate counterfactual potential functions] (section 4 para. 1 recites “Based on the counterfactual space CFf(xˆ) defined in the previous section, we would like to produce counterfactual explanations for the output of a model f on a given input xˆ by trying to find a nearest counterfactual, which is defined as: (EQ2). For the time being, we assume that a notion of distance between instances, d, is given” (i.e., generating a counterfactual explanation based on a value obtained from the mathematical expression, or the model)),
setting, among the set of candidate counterfactual potential functions, a candidate counterfactual potential function [corresponding to the maximum value as a counterfactual explanation] (figure 1 of Karimi shows setting, or outputting, the counterfactual x^c and Algorithm 1 of Karimi returns, or sets the output as x^ϵ));
bounding an output of the AI regression model having a continuous output using the set counterfactual explanation (section 4.1 para. 3 recites “the bound δmin returned by Algorithm 1 provides a certificate that any solution x^* to (EQ 2) must satisfy d(x^*, x^) > δmin” (i.e., bounding the output, or set counterfactual x^c returned by Algorithm 1), wherein the bounding includes:
inspecting, by the at least one processor, the output of the AI regression model using the set counterfactual explanation; and providing, by the at least one processor, the optimum counterfactual explanation that indicates how a data input should be minimally different for the AI regression model to change the output from that of the query instance (section 1 para. 3 recites “we focus on answering the second question, or equivalently, on generating counterfactual explanations. Of specific importance is the problem of finding the nearest counterfactual explanation – i.e., identifying the set of features resulting in the desired prediction while remaining at minimum distance from the original set of features describing the individual”. Section 4.2 para. 6 recites “one might be interested in generating a (small) set of diverse counterfactual explanations for the same instance ˆx. To this end, we iteratively call Algorithm 1 with a constraints formula φv that includes diversity clauses to ensure that the newly generated explanation is substantially different from all the previous ones. We can encode diversity by forcing that the distance between every pair of counterfactual explanations is greater than a given value. For example, we can. . . restrict repetitive counterfactuals by enforcing subsequent counterfactuals to have 0-norm distance at least 1 from all previous counterfactuals” (i.e., inspecting a given counterfactual explanation and providing the counterfactual explanation indicating how to minimally change the output to ensure different counterfactuals from additional iterations of the model));
wherein the counterfactual explanation corresponds to an explanation of the output of the AI regression model, which is expressed based on respective distances from the query instance which results in different model outputs than a model output for the query itself (section 1 para. 4 recites “Moreover, we rely on a binary search strategy on the distance threshold to find an approximation to the nearest (plausible) counterfactual with an arbitrary degree of accuracy, and a lower bound on distance such that no counterfactual provably exists at a smaller distance. Finally, once nearest counterfactuals are found, diversity constraints may be added to the satisfiability problems to find alternative counterfactuals. The overall architecture of MACE is illustrated in Figure 1” (i.e., the counterfactual explanation is expressed based on a distance between the counterfactual and the actual model output));
wherein the AI regression model adapts an iterative scheme such that a subsequent input, among the plurality of inputs, is selectively fed to the AI regression model as a candidate counterfactual is chosen [by maximizing an expected improvement of at least one of the candidate counterfactual potential functions] (section 4.2 para. 6 recites “one might be interested in generating a (small) set of diverse counterfactual explanations for the same instance ˆx. To this end, we iteratively call Algorithm 1 with a constraints formula "φv that includes diversity clauses to ensure that the newly generated explanation is substantially different from all the previous ones”. Algorithm 1 depicts an iterative method that generates counterfactual candidates while distance δmax – distance δmin > accuracy ϵ (i.e., the model executes over a number of iterations until a stopping condition is met));
identifying, by the at least one processor and using the optimum counterfactual explanation, a different sequence of input for the plurality of inputs to change an output of the Al regression model for sequentially identifying boundaries of the Al regression model (section 1 para. 4 recites “in MACE we map the nearest counterfactual problem into a sequence of satisfiability (SAT) problems, by expressing both the predictive model and the distance function (as well as the plausibility and diversity constraints) as logic formulae. Each of these satisfiability problems aims to verify if there exists a counterfactual explanation at a distance smaller than a given threshold, and can be solved using standard SMT (satisfiability modulo theories) solvers. Moreover, we rely on a binary search strategy on the distance threshold to find an approximation to the nearest (plausible) counterfactual with an arbitrary degree of accuracy, and a lower bound on distance such that no counterfactual provably exists at a smaller distance” (i.e., identifying a sequence of counterfactuals to determine which inputs will change the output of the AI regression model. Examiner notes that “for sequentially identifying boundaries of the Al regression model” is interpreted as the intended use or outcome of identifying the different sequence of inputs and does not further limit the claim)).
However, while one of ordinary skill would recognize that the operations taught by Karimi are implemented via computer, Karimi does not explicitly teach a distributed network comprising multiple servers, a client device, a network interface; and receiving, by the at least one processor from the client device, an input value designated by the client device for generating the counterfactual explanation, the input value being different from the obtained query instance.
McGrath teaches a distributed network comprising multiple servers, a client device, a network interface (col. 11 lines 63-67 recite “As further shown in FIG. 4, environment 400 may include a network 420, and/or a user device 430. Devices and/or elements of environment 400 may interconnect via wired connections and/or wireless connections” (i.e., a distributed network comprising multiple devices). Col. 12 lines 17-26 recite “Computing hardware 403 includes hardware and corresponding resources from one or more computing devices. For example, computing hardware 403 may include hardware from a single computing device (e.g., a single server) or from multiple computing devices (e.g., multiple servers), such as multiple computing devices in one or more data centers” (i.e., the distributed network can include multiple servers/devices). Col. 13 lines 11-17 recite “The user device 430 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with user information and/or a qualification model to obtain one or more counterfactual explanations, as described elsewhere herein. The user device 430 may include a communication device and/or a computing device” (i.e., a client device capable of communicating with one or more servers). Col. 14 lines 21-24 recite “communication component 570 may include a receiver, a transmitter, a transceiver, a modem, a network interface card, an antenna, and/or the like” (i.e., a network interface));
and receiving, by the at least one processor from the client device, an input value designated by the client device for generating the counterfactual explanation, the input value being different from the obtained query instance (col. 7 lines 38-51 recite “As further shown in FIG. 1C, and by reference number 145, the automated analysis system may iterate one or more of the processes described in connection with example 100. For example, the automated analysis system, during a training period, may iteratively receive pairs of user information and a prediction output of the qualification model, iteratively select one or more relevant counterfactual explanations (e.g., using the iteratively updated labels and/or retrained generator model, clustering model, or classification model), iteratively provide the selected counterfactual explanations for feedback, iteratively update labels based on iteratively received feedback data, and iteratively retrain the clustering model and/or the classification model according to the feedback data. In some implementations, each iteration may be associated with a different counterfactual explanation associated with a prediction output and/or a different analysis of different user information associated with different users” (i.e., receiving an input value from the user separate from the input space whose corresponding model output is to be explained, or query instance)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by generating counterfactual explanations using the method taught by Karimi with the user input values taught by McGrath. Karimi and McGrath are both directed to methods of generating counterfactual explanations for the output of an artificial intelligence model. While Karimi does not explicitly teach obtaining input values from a user, one of ordinary skill would recognize that the input values in Karimi would need to be obtained from a source, and that the method from Karimi could be modified to use the user input from McGrath as this source.
However, while the combination of Karimi and McGrath teaches an iterative scheme (see at least section 4.2 para. 6 and Algorithm 1 of Karimi), the combination of Karimi and McGrath does not explicitly teach a candidate potential function corresponding to the maximum value, an iterative scheme such that a subsequent input fed to the model as a candidate is chosen by maximizing an expected improvement of at least one of the candidate potential functions, wherein the expected improvement is determined to be maximized when a convergence point is identified for limiting the number of iterations to be performed to provide a more efficient search [of the optimum counterfactual explanation].
Snoek teaches wherein an optimum [candidate explanation corresponds to a maximum value of one of the set of candidate [counterfactual] potential functions for a designated input value [received from the client device], a candidate potential function corresponding to the maximum value (section 2.2 para. 2 recites the expected improvement acquisition function which outputs a value corresponding to a maximum value of a candidate potential function:
“aEI(x ; {xn, yn}, θ) = σ(x ; {xn, yn}; θ) (ϒ(x) φ(ϒ(x)) + N(ϒ(x) ; 0, 1)”. This methodology is applied to a regression model in figure 3b. (i.e., an optimum candidate potential function corresponding to a maximum value));
an iterative scheme such that a subsequent input fed to the AI regression model as a candidate is chosen by maximizing an expected improvement of at least one of the candidate potential functions, wherein the expected improvement is determined to be maximized when a convergence point is identified for limiting the number of iterations to be performed to provide a more efficient search [of the optimum counterfactual explanation] (section 1 para. 1 recites “To pick the hyperparameters of the next experiment (i.e. the next iteration), one can optimize the expected improvement (EI) over the current best result or the Gaussian process upper confidence bound (UCB). EI and UCB have been shown to be efficient in the number of function evaluations required to find the global optimum of many multimodal black-box functions”. Section 2.2 para. 2 recites the expected improvement acquisition function which outputs a value corresponding to a maximum value of a candidate potential function:
“aEI(x ; {xn, yn}, θ) = σ(x ; {xn, yn}; θ) (ϒ(x) φ(ϒ(x)) + N(ϒ(x) ; 0, 1)”. This methodology is applied to a regression model in figure 3b”. Section 4.2 para. 2 recites “We. . .used 100 topics and η = α = 0:01 in our experiments in order to emulate their analysis and repeated exactly the grid search reported in the paper. Each online LDA (i.e., Latent Dirichlet Allocation) evaluation generally took between five to ten hours to converge, thus the grid search requires approximately 60 to 120 processor days to complete. The only difference was the randomly sampled collection of articles in the data set and the choice of the vocabulary. We ran each evaluation for 10 hours or until convergence”. Section 4.3 para. 2 recites “We explore 25 settings of the parameter C, on a log scale from 10-1 to 106, 14 settings of α, on a log scale from 0.1 to 5 and the model convergence tolerance, ϵ
∈
{10-4, 10-3, 10-2,10-1}” (i.e., choosing a subsequent input, or next hyperparameter, for a model using an expected improvement of a potential function and determining when the iterative scheme has converged, or reached an optimum value. Examiner notes that “to provide a more efficient search of the optimum counterfactual explanation” is interpreted as the intended use or outcome of identifying a convergence point and does not further limit the claim)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by modifying the optimization algorithm for finding a nearest counterfactual from Karimi (as modified by McGrath) to implement the expected improvement function from Snoek to determine a next candidate potential function to input to the regression model, rather than the stopping condition for Algorithm 1 from Karimi. Karimi teaches using a binary search strategy in at least section 4.1 to find a best counterfactual, but notes in paragraph 3 of section 4.1 that “our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressible in a fairly general programming language”. Snoek teaches in section 1 that the expected improvement has been “shown to be efficient in the number of function evaluations required to find the global optimum of many multimodal black-box functions”. As such, one of ordinary skill in the art would understand how to modify the optimization method from Karimi using the optimization acquisition function from Snoek to return a result that corresponds to a maximum value of a candidate potential function.
However, the combination of Karimi, McGrath, and Snoek does not explicitly teach a plurality of exponential-polynomial functions.
Zhu teaches a plurality of exponential-polynomial functions (section 4 para. 1-2 recite “we further restrict ourselves to a subset of Bayesian learners whose prior and likelihood are in the exponential family and are conjugate. For this subset of Bayesian learners, finding the optimal teaching set D naturally decomposes into two steps: In the first step one solves a convex optimization problem to find the optimal aggregate sufficient statistics for D. In the second step one “unpacks” the aggregate sufficient statistics into actual teaching examples. We present an approximate algorithm for doing so. We recall that an exponential family distribution takes the form p(x | θ) = h(x) exp θTT(x) − A(θ)) where T(x) ∈ RD is the D-dimensional sufficient statistics of x, θ ∈ RD is the natural parameter, A(θ) is the log partition function, and h(x) modifies the base measure. For a set D = {x1, . . . , xn}, the likelihood function under the exponential family takes a similar form p(D | θ) = (∏ni=1 h(xi)) exp (θTs − nA(θ)), where we define (EQ5) to be the aggregate sufficient statistics over D. The corresponding conjugate prior is the exponential family distribution with natural parameters (λ1, λ2) ∈ RD × R: p(θ | λ1, λ2) = h0(θ) exp (λT1θ − λ2A(θ) − A0(λ1, λ2)). The posterior distribution is p(θ | D, λ1, λ2) = h0(θ) exp(λ1 + s)Tθ − (λ2 + n)A(θ) − A0(λ1 + s, λ2 + n))” (i.e., a family, or plurality of exponential-polynomial functions)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by modifying the optimization method to determine candidate counterfactual potential functions taught by Karimi (as modified by McGrath and Snoek) with the Bayesian exponential polynomial functions taught by Zhu to improve the optimization function used to identify candidate counterfactual potential functions. Karimi notes in paragraph 3 of section 4.1 that “our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressible in a fairly general programming language”. As the Bayesian optimization method from Zhu are expressed in equations that can be translated into a general programming language, one of ordinary skill in the art would understand how to modify the optimization method from Karimi using the Bayesian exponential polynomial functions taught by Zhu to determine a plurality of candidate counterfactual potential functions.
Regarding claim 2, the combination of Karimi, McGrath, Snoek, and Zhu teaches the method of claim 1, wherein the generating of the counterfactual explanation further comprises: selecting, from among a predetermined set of possible input data points, a first data point to be used as an input to the obtained AI regression model (Karimi section 3 para. 1 recites “This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors x ϵ X into decisions y ϵ {0, 1}”. Karimi section 4 para. 1 recites “Based on the counterfactual space CFf(xˆ) defined in the previous section, we would like to produce counterfactual explanations for the output of a model f on a given input xˆ by trying to find a nearest counterfactual, which is defined as: (EQ2). For the time being, we assume that a notion of distance between instances, d, is given”. The footnote for section 3 para. 1 recites “While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where y ϵ R and more generally any other output domain” (i.e., generating a counterfactual explanation for a given, or selected, input value from a set of input values));
computing a corresponding output value of the obtained AI regression model based on the selected first data point (Karimi section 4.1 para. 1 recites “Our goal now is to leverage the representation of CFf(xˆ) in terms of a logic formula to solve (EQ2). To this end, we map the optimization problem in (EQ2) into a sequence of satisfiability problems, which can be verified or refuted by standard SMT solvers. We do so by first converting the expression d(x, xˆ) ≤ δ, where δ ϵ [0, 1], into a logic formula φd, xˆ (x, δ), which is valid if and only if d(x, xˆ) ≤ δ. We assume here that the distance d function is expressed by a program in the same language that we used to represent the models in Section 3”. The footnote for section 3 para. 1 recites “While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where y ϵ R and more generally any other output domain” (i.e., computing an output from a regression model based on the selected input));
determining whether the computed corresponding output value corresponds to a predetermined optimum potential value (Karimi section 4.1 para. 1 recites “Then, both the counterfactual formula φCFf (xˆ)(x) and the distance formula φd,ˆx(x, δ) are combined into the logic formula: φxˆ,δ(x) = φCFf(xˆ)(x) ^ φd, xˆ(x, δ) , which is satisfiable if and only if there exists a counterfactual such that x ϵ CFf(xˆ) such that d(x, xˆ) ≤ δ. To check whether the above formula is satisfiable we use the satisfiability oracle SAT (ψ(x)), which returns either an instance x such that ψ(x) is valid, or “unsatisfiable” if no such x exists” (i.e., determining whether the computed output corresponds to an optimum value));
and when the computed corresponding output value is determined as corresponding to the predetermined optimum potential value, generating the counterfactual explanation based on the selected first data point (Karimi section 4.1 para. 2 recites “while the oracle SAT allows us to verify if there exist counterfactual explanations at distance smaller or equal than a given threshold δ, solving optimization (EQ2) requires finding a nearest counterfactual. To do so, we apply a binary search strategy on the distance threshold δ ϵ [0, 1] that allows us to find approximately nearest counterfactuals with a pre-specified degree of accuracy. This is implemented in Algorithm 1, which for an accuracy parameter є > 0 makes at most O(log(1/є) calls to SAT and returns a counterfactual xˆє ϵ CFf(xˆ) such that d(xˆϵ, xˆ) ≤ d(xˆ*, xˆ) + є, where xˆ* is some solution of the optimization problem in (EQ2)” (i.e., generating a counterfactual explanation when a computed output is determined to correspond to an optimum output)).
Regarding claim 3, the combination of Karimi, McGrath, Snoek, and Zhu teaches the method of claim 2, wherein when the computed corresponding output value is determined as not corresponding to the predetermined optimum potential value, the method further comprises: selecting a next data point from among the predetermined set of possible input data points to be used as an input to the obtained AI regression model (Karimi fig. 2b and section 3 para. 1 recite “This section defines a logical representation of counterfactual explanations for predictive models, which are functions mapping input feature vectors x ϵ X into decisions y ϵ {0, 1}”. Karimi section 4 para. 1 recites “Based on the counterfactual space CFf(xˆ) defined in the previous section, we would like to produce counterfactual explanations for the output of a model f on a given input xˆ by trying to find a nearest counterfactual, which is defined as: (EQ2). For the time being, we assume that a notion of distance between instances, d, is given”. The footnote for section 3 para. 1 recites “While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where y ϵ R and more generally any other output domain” (i.e., generating a counterfactual explanation for a given, or selected, input value from a set of input values in an iterative process shown by at least figure 2b of Karimi));
computing a next corresponding output value of the obtained AI regression model based on the selected next data point (Karimi section 4.1 para. 1 recites “Our goal now is to leverage the representation of CFf(xˆ) in terms of a logic formula to solve (EQ2). To this end, we map the optimization problem in (EQ2) into a sequence of satisfiability problems, which can be verified or refuted by standard SMT solvers. We do so by first converting the expression d(x, xˆ) ≤ δ, where δ ϵ [0, 1], into a logic formula φd, xˆ (x, δ), which is valid if and only if d(x, xˆ) ≤ δ. We assume here that the distance d function is expressed by a program in the same language that we used to represent the models in Section 3”. The footnote for section 3 para. 1 recites “While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where y ϵ R and more generally any other output domain” (i.e., computing an output from a regression model based on the selected input in an iterative process shown by at least figure 2b of Karimi));
determining whether the next computed corresponding output value corresponds to the predetermined optimum potential value (Karimi section 4.1 para. 1 recites “Then, both the counterfactual formula φCFf (xˆ)(x) and the distance formula φd,ˆx(x, δ) are combined into the logic formula: φxˆ,δ(x) = φCFf(xˆ)(x) ^ φd, xˆ(x, δ) , which is satisfiable if and only if there exists a counterfactual such that x ϵ CFf(xˆ) such that d(x, xˆ) ≤ δ. To check whether the above formula is satisfiable we use the satisfiability oracle SAT (ψ(x)), which returns either an instance x such that ψ(x) is valid, or “unsatisfiable” if no such x exists” (i.e., determining whether the computed output corresponds to an optimum value in an iterative process shown by at least figure 2b of Karimi));
when the next computed corresponding output value is determined as corresponding to the predetermined optimum potential value, generating the counterfactual explanation based on the most recently selected next data point (Karimi section 4.1 para. 2 recites “while the oracle SAT allows us to verify if there exist counterfactual explanations at distance smaller or equal than a given threshold δ, solving optimization (EQ2) requires finding a nearest counterfactual. To do so, we apply a binary search strategy on the distance threshold δ ϵ [0, 1] that allows us to find approximately nearest counterfactuals with a pre-specified degree of accuracy. This is implemented in Algorithm 1, which for an accuracy parameter є > 0 makes at most O(log(1/є) calls to SAT and returns a counterfactual xˆє ϵ CFf(xˆ) such that d(xˆϵ, xˆ) ≤ d(xˆ*, xˆ) + є, where xˆ* is some solution of the optimization problem in (EQ2)” (i.e., generating a counterfactual explanation when a computed output is determined to correspond to an optimum output in an iterative process shown by at least figure 2b of Karimi));
and when the next computed corresponding output value is determined as not corresponding to the predetermined optimum potential value, repeating the selecting, computing, and determining steps for additional next data points until the computed corresponding value is determined as corresponding to the predetermined optimum potential value (Karimi section 4.1 para. 1 recites “Then, both the counterfactual formula φCFf (xˆ)(x) and the distance formula φd,ˆx(x, δ) are combined into the logic formula: φxˆ,δ(x) = φCFf(xˆ)(x) ^ φd, xˆ(x, δ) , which is satisfiable if and only if there exists a counterfactual such that x ϵ CFf(xˆ) such that d(x, xˆ) ≤ δ. To check whether the above formula is satisfiable we use the satisfiability oracle SAT (ψ(x)), which returns either an instance x such that ψ(x) is valid, or “unsatisfiable” if no such x exists”. Karimi section 4.1 para. 2 recites “while the oracle SAT allows us to verify if there exist counterfactual explanations at distance smaller or equal than a given threshold δ, solving optimization (EQ2) requires finding a nearest counterfactual. To do so, we apply a binary search strategy on the distance threshold δ ϵ [0, 1] that allows us to find approximately nearest counterfactuals with a pre-specified degree of accuracy. This is implemented in Algorithm 1, which for an accuracy parameter є > 0 makes at most O(log(1/є) calls to SAT and returns a counterfactual xˆє ϵ CFf(xˆ) such that d(xˆϵ, xˆ) ≤ d(xˆ*, xˆ) + є, where xˆ* is some solution of the optimization problem in (EQ2)” (i.e., repeating the steps of determining whether the computed output corresponds to an optimum value and generating a counterfactual explanation when a computed output is determined to correspond to an optimum output in an iterative process shown by at least figure 2b of Karimi)).
Regarding claim 4, the combination of Karimi, McGrath, Snoek, and Zhu teaches the method of claim 1, further comprising: wherein the defining of the set of candidate counterfactual potential functions comprises performing the differential continuous mapping between the respective output values of the obtained AI regression model and the real line over the predetermined subset of real numbers such that the input value [designated by the user] corresponds to the maximum value of the defined set of candidate counterfactual potential functions (Karimi fig. 1 and section 1 para. 5 recite “in MACE (i.e., Model-Agnostic Counterfactual Explanations) we map the nearest counterfactual problem into a sequence of satisfiability (SAT) problems, by expressing both the predictive model and the distance function (as well as the plausibility and diversity constraints) as logic formulae. Each of these satisfiability problems aims to verify if there exists a counterfactual explanation at a distance smaller than a given threshold, and can be solved using standard SMT (satisfiability modulo theories) solvers”. Section 3 para. 2 recites “given a factual input xˆ with f(xˆ) = yˆ and φf we define the counterfactual formula as (EQ1). Intuitively, the formula on the right hand side of (EQ1) says that “x is a counterfactual for xˆ if either f(xˆ) = 0 and f(x) = 1, or f(xˆ) = 1 and f(x) = 0”. It is thus clear from the definition that an input x satisfies φCFf (xˆ) if and only if x ϵ CFf(xˆ)”. Section 4.1 recites “We remark here our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressible in a fairly general programming language. As a consequence, we can handle a wide variety of predictive models, including both differentiable – such as, logistic regression and multilayer perceptron – and non-differentiable predictive models e.g., decision trees and random forest– as well as a wide variety of distance functions”. The footnote for section 3 para. 1 recites “While here we assume binary predictor models, i.e., classifiers, our approach generalizes to regression problems where y ϵ R and more generally any other output domain” (i.e., determining a candidate counterfactual using a differential continuous mapping between output values of the regression model and a real line over a subset of real numbers)),
wherein the set of candidate counterfactual potential functions favor an output of the AI regression model that is a specified distance away from an output for the query instance, the specified distance corresponding to the input value designated by the user (Karimi section 1 para. 4 recites “Moreover, we rely on a binary search strategy on the distance threshold to find an approximation to the nearest (plausible) counterfactual with an arbitrary degree of accuracy, and a lower bound on distance such that no counterfactual provably exists at a smaller distance. Finally, once nearest counterfactuals are found, diversity constraints may be added to the satisfiability problems to find alternative counterfactuals. The overall architecture of MACE is illustrated in Figure 1” (i.e., the counterfactual explanation is expressed based on a distance between the counterfactual and the actual model output. Examiner notes that this citation is provided to show how Karimi teaches computing distances between counterfactuals and actual model outputs, but wherein a set of candidate counterfactual potential functions favor an output that is a specified distance away is interpreted as the intended use or result of generating the candidate counterfactual potential functions and does not provide additional patentable weight to this claim limitation)).
receiving, by the at least one processor from a user, an input value designated by the user for generating the counterfactual explanation (McGrath col. 7 lines 38-51 recite “As further shown in FIG. 1C, and by reference number 145, the automated analysis system may iterate one or more of the processes described in connection with example 100. For example, the automated analysis system, during a training period, may iteratively receive pairs of user information and a prediction output of the qualification model, iteratively select one or more relevant counterfactual explanations (e.g., using the iteratively updated labels and/or retrained generator model, clustering model, or classification model), iteratively provide the selected counterfactual explanations for feedback, iteratively update labels based on iteratively received feedback data, and iteratively retrain the clustering model and/or the classification model according to the feedback data. In some implementations, each iteration may be associated with a different counterfactual explanation associated with a prediction output and/or a different analysis of different user information associated with different users” (i.e., receiving an input value from a user for generating a counterfactual explanation)).
Regarding claim 5, the combination of Karimi, McGrath, Snoek, and Zhu teaches the method of claim 4, wherein the defining of the set of candidate counterfactual potential functions further comprises: determining a plurality of candidate counterfactual potential functions (Karimi fig. 1 and section 1 para. 5 recite “in MACE (i.e., Model-Agnostic Counterfactual Explanations) we map the nearest counterfactual problem into a sequence of satisfiability (SAT) problems, by expressing both the predictive model and the distance function (as well as the plausibility and diversity constraints) as logic formulae. Each of these satisfiability problems aims to verify if there exists a counterfactual explanation at a distance smaller than a given threshold, and can be solved using standard SMT (satisfiability modulo theories) solvers”. Karimi section 3 para. 2 recites “given a factual input xˆ with f(xˆ) = yˆ and φf we define the counterfactual formula as (EQ1). Intuitively, the formula on the right hand side of (EQ1) says that “x is a counterfactual for xˆ if either f(xˆ) = 0 and f(x) = 1, or f(xˆ) = 1 and f(x) = 0”. It is thus clear from the definition that an input x satisfies φCFf (xˆ) if and only if x ϵ CFf(xˆ)”. Karimi section 4.1 recites “We remark here our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressible in a fairly general programming language. As a consequence, we can handle a wide variety of predictive models, including both differentiable – such as, logistic regression and multilayer perceptron – and non-differentiable predictive models e.g., decision trees and random forest– as well as a wide variety of distance functions” (i.e., determining candidate counterfactuals)).
optimizing the determined plurality of candidate counterfactual potential functions with respect to the input value designated by the user and generating the counterfactual explanation based on a result of the optimizing (Karimi section 4 para. 1 recites “Based on the counterfactual space CFf(xˆ) defined in the previous section, we would like to produce counterfactual explanations for the output of a model f on a given input xˆ by trying to find a nearest counterfactual, which is defined as: (EQ2). For the time being, we assume that a notion of distance between instances, d, is given”. McGrath col. 7 lines 38-51 recite “As further shown in FIG. 1C, and by reference number 145, the automated analysis system may iterate one or more of the processes described in connection with example 100. For example, the automated analysis system, during a training period, may iteratively receive pairs of user information and a prediction output of the qualification model, iteratively select one or more relevant counterfactual explanations (e.g., using the iteratively updated labels and/or retrained generator model, clustering model, or classification model), iteratively provide the selected counterfactual explanations for feedback, iteratively update labels based on iteratively received feedback data, and iteratively retrain the clustering model and/or the classification model according to the feedback data. In some implementations, each iteration may be associated with a different counterfactual explanation associated with a prediction output and/or a different analysis of different user information associated with different users” (i.e., determining and optimizing candidate counterfactual explanations based on an input value received from a user, and generating at least one counterfactual explanation)).
Regarding claim 7, the combination of Karimi, McGrath, Snoek, and Zhu teaches the method of claim 5, wherein the determining of the plurality of candidate counterfactual potential functions comprises defining a plurality of exponential-polynomial functions (Zhu section 4 para. 1-2 recite “we further restrict ourselves to a subset of Bayesian learners whose prior and likelihood are in the exponential family and are conjugate. For this subset of Bayesian learners, finding the optimal teaching set D naturally decomposes into two steps: In the first step one solves a convex optimization problem to find the optimal aggregate sufficient statistics for D. In the second step one “unpacks” the aggregate sufficient statistics into actual teaching examples. We present an approximate algorithm for doing so. We recall that an exponential family distribution takes the form p(x | θ) = h(x) exp θTT(x) − A(θ)) where T(x) ∈ RD is the D-dimensional sufficient statistics of x, θ ∈ RD is the natural parameter, A(θ) is the log partition function, and h(x) modifies the base measure. For a set D = {x1, . . . , xn}, the likelihood function under the exponential family takes a similar form p(D | θ) = (∏ni=1 h(xi)) exp (θTs − nA(θ)), where we define (EQ5) to be the aggregate sufficient statistics over D. The corresponding conjugate prior is the exponential family distribution with natural parameters (λ1, λ2) ∈ RD × R: p(θ | λ1, λ2) = h0(θ) exp (λT1θ − λ2A(θ) − A0(λ1, λ2)). The posterior distribution is p(θ | D, λ1, λ2) = h0(θ) exp(λ1 + s)Tθ − (λ2 + n)A(θ) − A0(λ1 + s, λ2 + n))” (i.e., a family, or plurality of exponential-polynomial functions)) of the input based on the input value designated by the user (McGrath col. 7 lines 38-51 recite “As further shown in FIG. 1C, and by reference number 145, the automated analysis system may iterate one or more of the processes described in connection with example 100. For example, the automated analysis system, during a training period, may iteratively receive pairs of user information and a prediction output of the qualification model, iteratively select one or more relevant counterfactual explanations (e.g., using the iteratively updated labels and/or retrained generator model, clustering model, or classification model), iteratively provide the selected counterfactual explanations for feedback, iteratively update labels based on iteratively received feedback data, and iteratively retrain the clustering model and/or the classification model according to the feedback data. In some implementations, each iteration may be associated with a different counterfactual explanation associated with a prediction output and/or a different analysis of different user information associated with different users” (i.e., determining counterfactual explanations based on an input value received from a user)).
Regarding claim 8, the combination of Karimi, McGrath, Snoek, and Zhu teaches the method of claim 1, wherein the Al regression model includes at least one from among a neural network model, a logistic regression model, and a random forest model (Karimi section 4.1 para. 3 recites “our approach to find nearest counterfactuals is agnostic to the details of the model and distance being used; the only requirement is that they must be expressible in a fairly general programming language. As a consequence, we can handle a wide variety of predictive models, including both differentiable – such as, logistic regression and multilayer perceptron – and non-differentiable predictive models – e.g., decision trees and random forest– as well as a wide variety of distance functions” (i.e., the regression model includes at least a logistic regression model)).
Claim 9 is a system claim and its limitation is included in claim 1. The only difference is that claim 9 requires a system. Therefore, claim 9 is rejected for the same reasons as claim 1.
Claim 10 is a system claim and its limitation is included in claim 2. Claim 10 is rejected for the same reasons as claim 2.
Claim 11 is a system claim and its limitation is included in claim 3. Claim 11 is rejected for the same reasons as claim 3.
Claim 12 is a system claim and its limitation is included in claim 4. Claim 12 is rejected for the same reasons as claim 4.
Claim 13 is a system claim and its limitation is included in claim 5. Claim 13 is rejected for the same reasons as claim 5.
Claim 15 is a system claim and its limitation is included in claim 7. Claim 15 is rejected for the same reasons as claim 7.
Claim 16 is a system claim and its limitation is included in claim 8. Claim 16 is rejected for the same reasons as claim 8.
Claim 17 is a non-transitory computer readable storage medium claim and its limitation is included in claim 1. The only difference is that claim 17 requires a non-transitory computer readable storage medium. Therefore, claim 17 is rejected for the same reasons as claim 1.
Claim 18 is a non-transitory computer readable storage medium claim and its limitation is included in claim 2. Claim 18 is rejected for the same reasons as claim 2.
Claim 19 is a non-transitory computer readable storage medium claim and its limitation is included in claim 3. Claim 19 is rejected for the same reasons as claim 3.
Claim 20 is a non-transitory computer readable storage medium claim and its limitation is included in claim 4. Claim 20 is rejected for the same reasons as claim 4.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20220114481 A1 (Yang et al) teaches a two-stage model-agnostic approach for generating counterfactual explanation via counterfactual feature selection and counterfactual feature optimization.
US 20200293834 A1 (Ghosh et al) teaches a method for using of a genetic algorithm to allows the generation of counterfactuals for both linear and nonlinear models.
US 20180314965 A1 (Dodson et al) teaches a method for identifying which of the set of categorical attributes of data instances causes a change in the anomaly scores using a counterfactual analysis.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571) 272-8350. The examiner can normally be reached on M-F 0900-1700 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached on (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/L.M.F./ Examiner, Art Unit 2147
/JAMES T TSAI/Primary Examiner, Art Unit 2147