Office Action Analysis: 18337571 — SYSTEMS AND METHODS FOR GENERATING EXPLAINABLE PREDICTIONS

Office Action

§101 §102
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the
first inventor to file provisions of the AIA .
Response to Arguments
2.	The Amendment filed on January 15, 2026 has been entered. The examiner acknowledges the amendments to claims 1, and 13.  

	Rejections under 35 U.S.C. § 101: Applicant argues that there is no group of the judicial exception related to making prediction. Examiner disagrees. The Cambridge Dictionary defines a prediction as, “a statement about what you think will happen in the future.” The words “you” and “think” both have connotations of a mental process, but going deeper into a prediction process, a reasonable prediction process would likely include evidence or data to provide context, knowledge and experience from the real world that is foundational to anticipating what could happen next, some reasoning and logic- both having roots in observation, analysis and evaluation, judgement, based on experience, and opinion, and finally an overall review of what has been assembled, more evaluation, to decide if any conclusions reached are reasonable. The Examiner claims these steps are all rooted in mental processes or organizing human activity by following rules and instructions. How useful would data or information be in making a prediction if there were no rules governing the collection of either? Can the future be predicted in the absence of a frame of reference based on the past (and that frame of reference formed from observation, evaluation, judgement, and opinion)? The Examiner disagrees with the Applicant’s conclusion that “no group of the abstract idea judicial exception related to ‘making predictions.’ 
	Applicant argues that the claims do not recite concepts including advertising, marketing, sales or following rules or instructions. Examiner notes that amended claim 1 does claim to generate an explainable prediction. Since the claim does not appear to limit the scope of predictions does the applicant argue that predictions developed by the invention should be limited to certain topical areas? The Examiner does not think so, and as such, the interpretation is that marketing, sales, and many other concepts should be included in the prediction realm, and thus those enumerated groupings are reasonably included in the analysis.
	Applicant additionally argues integrating the exception into a practical application via improving the functioning of a computer or improving the technology or technical field. Examiner does not see evidence of improving the functioning of a computer. Discussion of generating at the processor an attribution model from a communication channel presents no more information than developing a correlation from a single data source and using that as a prediction basis. It reads as mathematical computations performed on a processor delivering output to a display device. There is no evidence of the invention integrating with any other machine in the technology area. The additional elements appear to be additional components to the prediction development process that receive input from the invention and return output back to the processor of the invention. There is no indication of the invention taking any action or performing a function beyond results provided to a display.
In view of the above, the Examiner finds the arguments in favor of a practical application to be not compelling and the rejections based on 3 U.S.C. § 101 will not be withdrawn.  

	Rejections under 35 U.S.C. § 103: Applicant’s argues that prior art (DeCaprio) fails to suggest a window size. Examiner notes that prior art discloses ‘selecting a subset of data elements,’ [  ], the purpose for a window size, ‘apply a filtering operation associated with the feature to the collection of data elements,’[  ], targeting specific temporal aspects of the data, and ‘a time stamp associated with a data element,’ which provides that specific aspect, time. It is not an unreasonably broad interpretation of the prior art to capture the invention’s definition of a window size for causal sequences.
	Applicant argues differences between the invention and prior art concerning an incremental lift-based algorithm. Examiner sees both the algorithm and the explainability features of the prior art as both generating confidence in the prediction results. Prior art discloses “providing an output comprising the prediction for the entity and the evidence data characterizing data elements from the collection of data elements that explain the prediction for the entity,” while the algorithm calculates a ratio of responses, and this characterizing a gain in prescriptions based on receiving or not receiving the channel. This appears to be evidence data that explains the prediction for the entity. 
	Applicant argues an attribution model as different than explainability from a machine learning model in the prior art. The Examiner notes in [321] of the Applicant’s specification that the attribution model 3706 may generate an explainability prediction comprising a prediction rationale based on the prediction objective and [ ] and an attribution model. Applicant claims the attribution model is generated from a feature of an activity to provide a predication as being different from prior art which discloses generating respective evidence data for each high-impact feature based on the proper subset of the collection of data elements that are relevant to the high-impact feature. [1:35-45]. The Examiner also notes the Applicant’s disclosure of one or more machine learning models, including a neural network, at [170]. No additional information on the role of machine learning models is provided or disclosed in the diagrams. In the absence of additional information clarifying substantial differences between prior art and the invention, the Examiner notes that it is not an unreasonably broad interpretation of the prior art’s description of capabilities to envision the invention’s attribution model.
	In view of lack of additional detail concerning technical features of the claimed invention or disclosed machine learning models, the Examiner finds the arguments to be not compelling and the objections under 35 U.S.C. § 102 will not be withdrawn.
	
Claim Rejections – 35 U.S.C. § 101 
35 U.S.C. § 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed
to non-statutory subject matter. The claims, 1-20 are directed to a judicial exception (i.e., law of nature, natural phenomenon, abstract idea) without providing significantly more.  
Step 1

Step 1 of the subject matter eligibility analysis per MPEP § 2106.03, required the claims to be a process, machine, manufacture or a composition of matter. Claims 1-20 are directed to a process (method), and a machine (system), which are statutory categories of invention.  
Step 2A
Claims 1-20 are directed to abstract ideas, as explained below.
Prong one of the Step 2A analysis requires identifying the specific limitation(s) in the claim under examination that the examiner believes recites an abstract idea, and determining whether the identified limitation(s) falls within at least one of the groupings of abstract ideas of mathematical concepts, mental processes, and certain methods of organizing human activity.
Step 2A-Prong 1
The claims recite the following limitations that are directed to abstract ideas, which can be summarized as being directed to a method, the abstract idea, of making predictions, providing explanations for the predictions and supporting both with research and data.  

Claim 1 discloses A method for providing explainable
predictions, comprising:
receiving a prediction objective from a user, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing, sales, observation, evaluation, judgement, opinion),
providing, at least one data set from at least one data source;  (following rules or instructions, observation, evaluation, judgement, opinion),
determining, at least one activity from the at least one data set, the at least one activity comprising at least one feature of the corresponding data set; (economic principles and practices calculating costs, following rules or instructions, advertising, marketing, sales, observation, evaluation, judgement, opinion),
determining a window size for determining causal sequences
in the at least one activity as input to an incremental lift-based algorithm, the
window size selected based on a score; (following rules or instructions, advertising, marketing, sales, observation, evaluation, judgement, opinion),
generating, using the output of the incremental lift-based algorithm, at least 
one attribution model at least one attribution model is generated from the at least one feature of the at least one activity, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing, sales, observation, evaluation, judgement, opinion), the at least one attribution model operative to provide a prediction and at least one prediction rationale corresponding to the prediction; (economic principles and practices calculating costs, following rules or instructions, advertising, marketing, sales, observation, evaluation, judgement, opinion), and
generating an explainable prediction comprising the prediction and the at least one prediction rationale corresponding to the prediction, the at least one prediction rationale determined based on the prediction objective received from the user and the at least one attribution model, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing, sales, observation, evaluation, judgement, opinion).
Additional limitations on the method include: determining an activity label, a time-series label based on data and an initiating subject, potentially a healthcare provider, (economic principles and practices calculating costs, following rules or instructions, observation, evaluation, judgement, opinion- claim 2),  
where the label includes a static label based on data, comprising trend, frequency, a market driver, or loyalty; a prediction outcome comprising one of market share, sales volume, and patient count;  and a metric of the prediction, numerical corresponding to an increase, decrease or neutral value of the prediction outcome, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion - claim 3),
where the attribution [trace] comprises, determining time-indexed activity sequences from the prediction outcome; identifying a matching activity sub-sequence from all activity sequences, including a preceding sequence of actions based on activity label; and formulating  an attribution [trace] based on the activity sub-sequence; (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 4),
where the preceding sequence of actions is a variable length activity window, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion - claim 5),
where identifying the sub-sequence comprises: determining candidate subsequences each based on the activity label and sequence of actions and the metric is a lift metric associated with the candidate subsequences, and one sub-sequence is selected based on the lift metric, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 6),
and generating a binary classification [trace] based on a sub-sequence and lift metric, (observation, evaluation, judgement, opinion), where the attribution [trace] from the feature of the activity comprises the attribution [trace] based on an [results from data research] the binary classification [trace], and the trend [trace] where the attribution [trace] is either a Shapley [trace] or a Markov [trace], (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 7),
 determining an initiation [trace] for each subject based on an activity of the subject using a regression [trace], [devising] a metric for a future time period based on the initiation [trace] for the subject, making a prediction explanation based on the attribution [trace], where the predicted metric is a numerical prediction and explanation, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 8), 
determining a segment label based on the predicted metric for the future time period and a regression [trace], (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 9),
where the segment label is determined based on an odds ratio [trace] or classifier and comprises a rising star label, a grower label, a shrinker label, or a switcher label, (following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 10), 
where the segment label includes determining an embedded vector and identifying at least one matching seed based on the vector, with the seed corresponding to a segment label, (following rules or instructions, observation, evaluation, judgement, opinion – claim 11),
and the predicted segment label is a lookalike label for the initiating subject based on the matching seed, (economic principles and practices calculating costs, following rules or instructions, advertising, marketing or sales, observation, evaluation, judgement, opinion – claim 12).
Each of these claimed limitations employ abstract ideas, such as organizing human activity to include fundamental economic principles and practices, calculating costs, following rules or instructions, advertising, marketing or sales judgement, as well as mental process to include observation, evaluation and opinion.  
Claims 13-20 recite similar abstract ideas as those identified with respect to claims 1-12. 
Thus, the concepts set forth in claims 1-20 recite abstract ideas.
Step 2A-Prong 2
As per MPEP § 2106.04, while the claims 1-20 recite additional limitations which are hardware or software elements such as computer, the prediction objective relating to at least one communication channel, a memory, at least one data set corresponding to the at least one communication channel; a processor in communication with the memory, , SPMF algorithm, a SHapley Additive exPlanations (SHAP) algorithm, explanatory algorithm, or an XGBoost model, these limitations are not sufficient to qualify as a practical application being recited in the claims along with the abstract ideas since these elements are invoked as tools to apply the instructions of the abstract ideas in a specific technological environment. The mere application of an abstract idea in a particular technological environment and merely limiting the use of an abstract idea to a particular technological field do not integrate an abstract idea into a practical application (MPEP § 2106.05 (f) & (h)). 
Evaluated individually, the additional elements do not integrate the identified abstract ideas into a practical application. Evaluating the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually.
The claims do not amount to a “practical application” of the abstract idea because they neither (1) recite any improvements to another technology or technical field; (2) recite any improvements to the functioning of the computer itself; (3) apply the judicial exception with, or by use of, a particular machine; (4) effect a transformation or reduction of a particular article to a different state or thing; (5) provide other meaningful limitations beyond generally linking the use of the judicial exception to a particular technological environment.
Accordingly, claims 1-20 are directed to abstract ideas.
Step 2B
Claims 1-20 do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination, do not amount to significantly more than the abstract idea. 
The analysis above describes how the claims recite the additional elements beyond those identified above as being directed to an abstract idea, as well as why identified judicial exception(s) are not integrated into a practical application. These findings are hereby incorporated into the analysis of the additional elements when considered both individually and in combination. 
For the reasons provided in the analysis in Step 2A, Prong 1, evaluated individually, the additional elements do not amount to significantly more than a judicial exception. Thus, taken alone, the additional elements do not amount to significantly more than a judicial exception. 
Evaluating the claim limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. In addition to the factors discussed regarding Step 2A, prong two, there is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely amount to instructions to implement the identified abstract ideas on a computer.
Therefore, since there are no limitations in the claims 1-20 that transform the exception into a patent eligible application such that the claims amount to significantly more than the exception itself, the claims are directed to non-statutory subject matter and are rejected under 35 U.S.C. § 101. 


	Claim Rejections 35 U.S.C. §102
5.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless – 
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

6.	Claims 1-20 are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by DeCaprio, (US 11176471 B1), hereafter DeCaprio, “Explainable Machine Learning Models.”  


Regarding Claim 1, Claim 1 discloses A computer-implemented method for providing explainable predictions, DeCaprio teaches, (an explainable prediction system implemented as computer programs on one or more computers, [1:17-19]), comprising:
receiving a prediction objective from a user, the prediction objective relating to
at least one communication channel; (optimize an objective function, [9:32-33] and in some cases, the prediction system 100 (described with reference to FIG. 1) can provide an interface for a user to specify respective filtering and transformation operations that should be used to generate each feature that is provided as an input to the machine learning model to generate predictions, [14:14-19]),
providing, at a memory, at least one data set from at least one data source, the
at least one data set corresponding to the at least one communication channel;  (the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining a collection of data elements characterizing an entity; [3:19-23]), determining, at a processor in communication with the memory, at least one activity from the at least one data set, the at least one activity comprising at least one feature of the corresponding data set; (generating a plurality of features that collectively define a feature representation of the entity from the collection of data elements characterizing the entity; processing the feature representation of the entity using a machine learning model to generate a prediction for the entity; generating evidence data characterizing data elements from the collection of data elements that explain the prediction generated by the machine learning model for the entity; and providing an output
comprising the prediction for the entity and the evidence data characterizing data elements from the collection of data elements that explain the prediction for the entity, [3:23-34]).
determining, at the processor, a window size for determining causal sequences
in the at least one activity as input to an incremental lift-based algorithm, the
window size selected based on a score; (to select a proper subset of the data elements that are relevant to a feature in the feature representation, the feature generation engine can apply a filtering operation associated with the feature to the collection of data elements. The filtering operation can define one or more selection criteria that, if satisfied by a data element, indicate that the data element is relevant to the feature. The selection criteria of a filtering operation can be based on, e.g., a value of a data element, a type of a data element, a time stamp associated with a data element, or a combination thereof, [7:15 - 24], (the system can generate evidence data for a high-impact feature by filtering the collection of data elements to select data elements that are relevant to the high-impact feature, e.g., that were processed by the system to generate the high-impact feature, [4:20-24], The selection criteria of a filtering operation can be based on, e.g., a value of a data element, a type of a data element, a time stamp associated with a data element, or a combination thereof, [7:15-24]).
generating, at the processor, using the output of the incremental lift-based
algorithm, at least one attribution model corresponding to the at least one
communication channel, wherein the at least one attribution model is generated from the at least one feature of the at least one activity,  (a method performed by one or more data processing apparatus, [1:20-21], and generating evidence data characterizing data elements from the collection of data elements that explain the prediction generated by the machine learning model for the entity; [1:28-31]), the at least one attribution model operative to provide a] prediction corresponding to the at least one communication channel and at least one prediction rationale corresponding to the prediction; (a method performed by one or more data processing apparatus, [1:20-21],   and evidence data characterizing data elements from the collection of data elements that explain the prediction for the entity, [1:32-34]), DiCaprio NFR p14 and
generating, at the processor, an explainable prediction comprising the prediction and the at least one prediction rationale corresponding to the prediction, (a method performed by one or more data processing apparatus, [1:20-21],   and evidence data characterizing data elements from the collection of data elements that explain the prediction for the entity, [1:32-34]), the at least one prediction rationale determined based on the prediction objective received from the user and the at least one attribution model. (In some cases, the prediction system 100 (described with reference to FIG. 1) can provide an interface for a user to specify respective filtering and transformation operations that should be used to generate each feature that is provided as an input to the machine learning model to generate predictions. For example, the prediction system 100 can enable the user to select the respective filtering and transformation operations to generate each feature from a predefined set of possible filtering and transformation operations. The explainability system 200 can then, without further user input and based on predefined rules, determine the filtering and summarization operations to generate the evidence data for each feature.
(57) For example, a user can specify that a “number of emergency room visits” feature is generated using a filtering operation to identify the dates of each emergency room visit in the last 12 months and a counting operation to count the number of emergency room visits. In this example, the explainability system 200 can automatically determine the filtering and summarization operations used to generate the evidence data 118 for the “number of emergency room visits feature.” For example, the explainability system 200 can determine that the evidence data 118 for the feature should be generated by a filtering operation to identify the date and reason for each emergency room visit in the last 12 months, and an aggregation operation to summarize the dates and reasons for the emergency room visits. The aggregation operation can, e.g., identify each unique reason for an emergency room visit, and then determine corresponding date range and number of occurrences for each unique emergency room visit. (58) The explainability system 200 can provide the high-impact features 114, the explainability scores 116, and the evidence data 118, as part of the explainability data 112, e.g., which is presented to a user of the explainability system 200, [14:14-48].

Regarding claim 2, The method of claim 1, wherein the determining the at least one activity further comprises:
- determining at least one activity label corresponding to the at least one activity,
the at least one activity label comprises a time-series activity label based on time series data in the at least one data set;  DeCaprio teaches, (the system generates a “consistency label” for each feature in each feature representation based on the set of explainability scores for the feature (508) [16:59-61], the selection criteria of a filtering operation can be based on, e.g., a value of a data element, a type of a data element, a time stamp associated with a data element, or a combination there of, [7:21-24], and in some cases, the specified time range for a filtering operation can be determined based on a current time point. For example, for a feature that counts the number of emergency room visits by a patient in the last 12 months, the feature generation engine 104 can determine the starting time point of the filtering operation to be 12 months before the current time point. In this example, the feature generation engine can determine the ending time point of the filtering operation to be the current time point. In some cases, the specified time range for a filtering operation can be dynamically determined based on the collection of data elements 102. For example, for a feature that counts the number of emergency room visits within 30 days of a previous hospital discharge, the feature generation engine 104 can determine the starting time point of the filtering operation to be the date of a previous hospital discharge. The feature generation engine can dynamically determine the date of the previous hospital discharge from the collection of data elements. In this example, the feature generation engine can determine the ending time point of the filtering operation to be 30 days after the date of the previous hospital discharge, [7:57 – 8:11]), and
- associating the at least one activity label with an initiating subject, wherein the initiating subject is optionally a healthcare provider, DeCaprio teaches that the feature describes the label and the subject is taken from an attribute of the data, (the system receives a set of feature representations, where each feature representation represents a respective entity (502). Each entity can be, e.g., a patient, and the feature representation of each entity can be generated from a collection of data elements characterizing the entity, as described with reference to FIG. 1., [16:35-40], and to select a proper subset of the data elements 102 that are relevant to a feature in the feature representation 106, the feature generation engine 104 can apply a filtering operation associated with the feature to the collection of data elements 102. The filtering operation can define one or more selection criteria that, if satisfied by a data element, indicate that the data element is relevant to the feature. The selection criteria of a filtering operation can be based on, e.g., a value of a data element, a type of a data element, a time stamp associated with a data element, or a combination thereof, [7:15-24]).

	Regarding claim 3, The method of claim 2, wherein the at least one activity label comprises:
	- a static activity label based on the at least one data set, the static activity label comprising one of a trend label, a frequency label, a market driver label and a loyalty label; DeCaprio teaches, (the feature generation engine 104 can then apply a model fitting operation to the selected data elements 102 to generate a feature defining a slope of a linear model, e.g., that defines a trend in the cholesterol levels of the patient, [9:1-5]),
	- a prediction outcome determined from the prediction objective, the prediction
outcome comprising one of market share, sales volume, and patient count; (the system obtains a collection of data elements characterizing an entity (402). The data elements can be represented as any appropriate sort of digital data, e.g., numerical data, alpha-numerical data, textual data, or a combination thereof. The data elements can characterize any appropriate type of entity, e.g., a patient in a healthcare environment, [15:34-39]), and the feature generation engine 104 can generate a feature by applying a “count” operation to a selected subset of the data elements 102 that are relevant to the feature, [8:35-38]), and
	- a metric of the prediction outcome, the metric comprising a numerical value
corresponding to an increase value, (a large positive explainability score for a feature can indicate that the feature contributed significantly to increasing the value of the prediction, [11:66-12:2]), a decrease value, (a large negative explainability score for a feature can indicate that the feature contributed significantly to decreasing the value of the prediction, [12:2-7]), or a neutral value of the prediction outcome.

	Regarding claim 4, The method of claim 3, wherein the generating the at least one attribution model from the at least one feature of the at least one activity comprises:
	- determining a plurality of time-indexed activity sequences associated with the prediction outcome; DeCaprio teaches, (The prediction system 100 can process the data elements 102 to generate any of a variety of predictions 110 for the entity characterized by the data elements. The prediction can be, e.g., a classification prediction, a regression prediction, or any other appropriate type of prediction. A classification prediction can include a respective score for each of one or more classes, where the score for a class defines a likelihood that the entity is included in the class. A regression prediction can include one or more numerical values from a continuous range of possible regression values (e.g., the interval [0,100]) that predict a quantity associated with the entity. [6:16-27], and the selection criteria of the filtering operation are based on one or more of: values of data elements, types of data elements, and time stamps associated with data elements, [2:47-50]).
	- identifying at least one matching activity sub-sequence in the plurality of time indexed activity sequences, the at least one matching activity sub-sequence including a preceding sequence of actions based on a candidate activity label; DeCaprio teaches, (to select a proper subset of the data elements 102 that are relevant to a feature in the feature representation 106, the feature generation engine 104 can apply a filtering operation associated with the feature to the collection of data elements 102. The filtering operation can define one or more selection criteria that, if satisfied by a data element, indicate that the data element is relevant to the feature. The selection criteria of a filtering operation can be based on, e.g., a value of a data element, a type of a data element or a time stamp associated with a data element, or a combination thereof, [7:15-24]), and generating an attribution model based on the at least one matching activity sub-sequence associated with the prediction outcome, (the feature generation engine 104 can generate a feature by applying a model fitting operation to a subset of the data elements 102 that are relevant to the feature. More specifically, the feature generation engine 104 can fit the parameters of a specified model, e.g., a linear model, or a quadratic model, or any other appropriate model, to the subset of data elements 102 that are relevant to the feature. The values of one or more parameters of the fitted model can then define corresponding feature(s) in the feature representation 106. [8:50-59]. The machine learning model 108 is configured to process the feature representation 106 of an entity, in accordance with a set of parameter values of the machine learning model 108, to generate a prediction 110 for the entity, [9:24-27]).

	Regarding claim 5, the method of claim 4 wherein the preceding sequence of actions is a variable length activity window, DeCaprio teaches, (the time stamp associated with a data element can represent, e.g., a time (e.g., a year, month, day, or hour) characterizing the data element. For example, the time stamp associated with a data element can characterize when an event associated with the data element occurred, e.g., if the data element pertains to an emergency room visit by a patient, then the time stamp can characterize when the emergency room visit occurred. A selection criterion of a filtering operation can define that a data element is relevant to a feature only if, e.g., the time stamp associated with the data element is within a specified time range. A specified time range can be defined, e.g., by a starting time point and an ending time point, [7:44-56].

	Regarding claim 6, The method of claim 4, wherein the identifying the at least one matching activity sub-sequence comprises:
	- determining a plurality of candidate subsequences in a plurality of time indexed
activity sequences, each of the plurality of candidate subsequences based on the candidate activity label and the preceding sequence of actions; DeCaprio teaches, (identifying one or more of the features in the feature representation as being high-impact features; identifying, for each high-impact feature, a respective proper subset of the collection of data elements as being relevant to the high-impact feature; and generating respective evidence data for each high-impact feature that based on the proper subset of the collection of data elements that are relevant to the high-impact feature. [1:38-45]
	- generating a trend model based on the at least one matching activity subsequence; (The feature generation engine 104 can then apply a model fitting operation to the selected data elements 102 to generate a feature defining a slope of a linear model, e.g., that defines a trend in the cholesterol levels of the patient, [9:1-5]),
	- wherein the determined metric is a lift metric associated with each of the plurality of candidate subsequences; (The explainability scoring engine 202 is configured to generate a respective explainability score 116 for each feature in the feature representation 106. Generally, the explainability score 116 for a feature is a numerical value that measures an impact of the feature on the prediction generated by the machine learning model by processing the feature representation 106. For example, a large positive explainability score for a feature can indicate that the feature contributed significantly to increasing the value of the prediction. Conversely, a large negative explainability score for a feature can indicate that the feature contributed significantly to decreasing the value of the prediction. An explainability score for a feature that is near zero can indicate that the feature did not contribute significantly to the prediction, [11:60-12:7]), and 
	- wherein the at least one matching activity sub-sequence is selected based on
the lift metric associated with each of the plurality of candidate subsequences, (the normalization engine 204 can normalize the respective explainability score 116 for each high-impact feature 114, e.g., by applying a scaling factor to the explainability score for the feature. The normalization engine 204 can apply a scaling factor to an explainability score, e.g., by setting the explainability score equal to a product of the explainability score and the scaling factor. Applying an appropriate scaling factor to the explainability scores can reduce or remove any dependence of the normalized explainability scores on the units or range of possible values of the prediction generated by the machine learning model. Normalizing the explainability scores can thus standardize their values and facilitate comparison of explainability scores across different machine learning models generating different predictions, [12:37-51]).

	Regarding claim 7, The method of claim 6 further comprising:
- generating a binary classification model based on the at least one matching activity sub-sequence and the associated lift metric; DeCaprio teaches, (generating each feature of the feature representation comprises: identifying a respective proper subset of the collection of data elements as being relevant to the feature; [2:33-36], and the predefined threshold is determined to be a value that, when provided as an input to a classifier as an absolute value of an explainability score for a feature, results in the classifier generating an output that defines a specified predicted likelihood that each explainability score in a set of explainability scores for the feature have a same sign, [1:59-65]).
- wherein the generating the at least one attribution model from the at least one feature of the at least one activity comprises generating the at least one attribution model based on an output of an SPMF algorithm, the binary classification model, and the trend model; and (the explainability score 116 for a feature is a numerical value that measures an impact of the feature on the prediction generated by the machine learning model by processing the feature representation 106. For example, a large positive explainability score for a feature can indicate that the feature contributed significantly to increasing the value of the prediction. Conversely, a large negative explainability score for a feature can indicate that the feature contributed significantly to decreasing the value of the prediction. An explainability score for a feature that is near zero can indicate that the feature did not contribute significantly to the prediction, [11:62-12:7]),
- wherein the attribution model is one of a Shapley model, and a Markov Model, (The explainability scoring engine 202 can generate the explainability scores 116 for a feature representation 106 using any appropriate technique. For example, the explainability scores 116 can be “SHAP” values, i.e., “Shapley additive explanations,” or “local interpretable model-agnostic explanations,” i.e., “LIME” values, that can essentially determine the explainability score for features by assessing how the prediction generated by the machine learning model changes as the features are permuted. [12:8-16]).
	
	Regarding claim 8, The method of claim 7, further comprising:
	- determining an initiation model for each of a plurality of initiating subjects, each initiation model based on the at least one activity of the corresponding initiating subject and comprising a regression model; DeCaprio teaches, (the prediction system 100 can process the data elements 102 to generate any of a variety of predictions 110 for the entity characterized by the data elements. [6:16-18], the system receives a set of feature representations, where each feature representation represents a respective entity (502). Each entity can be, e.g., a patient, and the feature representation of each entity can be generated from a collection of data elements characterizing the entity, as described with reference to FIG. 1., [16:35-40], and to select a proper subset of the data elements 102 that are relevant to a feature in the feature representation 106 [7:15-16]. The prediction can be, e.g., a classification prediction, a regression prediction, or any other appropriate type of prediction, [6:18-20])
	- generating a predicted metric for a future time period based on the initiation model for the corresponding initiating subject; (The prediction system 100 can process the data elements 102 to generate any of a variety of predictions 110 for the entity characterized by the data elements. The prediction can be, e.g., a classification prediction, a regression prediction, or any other appropriate type of prediction. A classification prediction can include a respective score for each of one or more classes, where the score for a class defines a likelihood that the entity is included in the class. A regression prediction can include one or more numerical values from a continuous range of possible regression values (e.g., the interval [0,100]) that predict a quantity associated with the entity, [6:16-27]).
	- using an explanatory algorithm to generate a prediction explanation based on the at least one attribution model; (the explainability scoring engine 202 is configured to generate a respective explainability score116 for each feature in the feature representation 106. Generally, the explainability score 116 [ ] measures an impact of the feature on the prediction generated by the machine learning model by processing the feature representation 106, [11:60-66]), and
	- wherein the predicted metric comprises a numerical prediction and the prediction explanation; (Generally, the explainability score 116 for a feature is a numerical value that measures an impact of the feature on the prediction, [11:62-64]), and
	- wherein the explanatory algorithm comprises at least one of a Local Interpretable Model-Agnostic Explanation algorithm or a SHapley Additive exPlanations (SHAP) algorithm, (the explainability scores 116 can be “SHAP” values, i.e., “Shapley additive explanations,” or “local interpretable model-agnostic explanations,” i.e., “LIME” values, that can essentially determine the explainability score for features by assessing how the prediction generated by the machine learning model changes as the features are permuted, [12:12-16]).

	Regarding claim 9, the method of claim 8, further comprising:
	determining a segment label for each corresponding initiating subject based on the predicted metric for the future time period; DeCaprio teaches, (the system generates a “consistency label” for each feature in each feature representation based on the set of explainability scores for the feature, [16:59-61]), and wherein the regression model is one of an ARIMA model or an XGBoost model, (the machine learning model can include, e.g., a linear model, a gradient boosted decision tree model, a neural network model, or a combination thereof. The prediction can be, e.g., a classification prediction, a regression prediction, or any other appropriate type of prediction, [15:51-56]).
	
	Regarding claim 10, The method of claim 9 wherein:
	the segment label is determined based on an odds ratio model or a classifier; DeCaprio teaches, (The system trains a classifier that is configured to process an absolute value of an explainability score for a feature to predict the consistency label of the feature (510). The classifier can be, e.g., a logistic regression classifier of the form:  (74) 𝑝 =  exp ( 𝑏 + 𝑚 .Math. 𝑥 ) / 1 + exp ( 𝑏 + 𝑚 .Math. 𝑥 ), [17:1-10], and
	the segment label comprises a rising star label, (a grower label, a shrinker label, or a switcher label, (the system generates a “consistency label” for each feature in each feature representation based on the set of explainability scores for the feature (508). For example, the system can generate a consistency label of “1” (or some other predefined value) for a feature if each explainability score for the feature has the same sign, i.e., if all the explainability scores for the feature are either positive or negative. Otherwise, the system can generate a consistency label of “0” (or some other predefined value) for the feature, [16:59-67], and the explainability scoring engine 202 is configured to generate a respective explainability score 116 for each feature in the feature representation 106. Generally, the explainability score 116 for a feature is a numerical value that measures an impact of the feature on the prediction generated by the machine learning model by processing the feature representation 106. For example, a large positive explainability score for a feature can indicate that the feature contributed significantly to increasing the value of the prediction. Conversely, a large negative explainability score for a feature can indicate that the feature contributed significantly to decreasing the value of the prediction. An explainability score for a feature that is near zero can indicate that the feature did not contribute significantly to the prediction, [11:60-12:08], and 
	The explainability scoring engine 202 can generate the explainability scores 116 for a feature representation 106 using any appropriate technique. For example, the explainability scores 116 can be “SHAP” values, i.e., “Shapley additive explanations,” or “local interpretable model-agnostic explanations,” i.e., “LIME” values, that can essentially determine the explainability score for features by assessing how the prediction generated by the machine learning model changes as the features are permuted, [12:09-16]).
	
	Regarding claim 11, The method of claim 9, wherein the determining the segment label comprises:
 	- determining an embedding vector based on data from the at least one data source associated with the initiating subject; DeCaprio teaches, (The feature representation 106 of the entity can be represented as an ordered collection of features, e.g., a vector or matrix of features, where each feature can be represented, e.g., by one or more numerical values. [7:3-14]), and
	- generating at least one matching seed, the at least one matching seed based on the embedding vector, the at least one matching seed corresponding to a predicted segment label, (The feature generation engine 104 is configured to process the data elements 102 characterizing the entity to generate a feature representation 106 of the entity. The feature representation 106 of the entity can be represented as an ordered collection of features, e.g., a vector or matrix of features, where each feature can be represented, e.g., by one or more numerical values. The feature representation 106 can include, e.g., 100 features, 1000 features, 5000 features, or any other appropriate number of features. To generate a feature in the feature representation 106, the feature generation engine 104 can select a proper subset of the data elements 102 that are relevant to the feature, and then apply one or more transformation operations to the selected subset of the data elements 102, [7:3-7]).
	
	Regarding claim 12, The method of claim 11, wherein the predicted segment label is a lookalike segment label for the initiating subject based on the at least one matching seed. DeCaprio teaches, (The feature generation engine 104 is configured to process the data elements 102 characterizing the entity to generate a feature representation 106 of the entity. The feature representation 106 of the entity can be represented as an ordered collection of features, e.g., a vector or matrix of features, where each feature can be represented, e.g., by one or more numerical values. The feature representation 106 can include, e.g., 100 features, 1000 features, 5000 features, or any other appropriate number of features. To generate a feature in the feature representation 106, the feature generation engine 104 can select a proper subset of the data elements 102 that are relevant to the feature, and then apply one or more transformation operations to the selected subset of the data elements 102, [7:1-14]).
	
	Claims 13-20 are rejected for reasons corresponding to those provided for Claims 1-12.  In these claims, the addition of a memory storing an attribution model, a network device, and a processor in communication with the memory and network device does not change the rational for the rejections under 35 U.S.C § 102 or the referenced prior art (DeCaprio teaches a system comprising: one or more computers; and one or more storage devices communicatively coupled to the one or more computers, [3:16-21]).

Conclusion  
The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure or directed to the state of the art is listed on the enclosed PTO-892.
Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to MICHAEL BOROWSKI whose telephone number is (703)756-1822. The examiner can normally be reached M-F 8-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jerry O’Connor can be reached on (571) 272-6787. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at (866) 217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call (800) 786-9199 (IN USA OR CANADA) or (571) 272-1000.





/MB/
Patent Examiner, Art Unit 3624
/MEHMET YESILDAG/Primary Examiner, Art Unit 3624
Read full office action
SYSTEMS AND METHODS FOR GENERATING EXPLAINABLE PREDICTIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEMS AND METHODS FOR GENERATING EXPLAINABLE PREDICTIONS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email