DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the claims filed 7/27/2023.
Claims 1-20 are presented for examination.
Information Disclosure Statement
The information disclosure statement (IDS) submitted 7/27/2023 has been considered by the examiner.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2-3 and 7-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 2, 9, and 16 (Subjective Terminology / Term of Degree): The claims recite "defining respective levels of explainability." The phrase "levels of explainability" is a subjective term of degree with no recognized mathematical or structural definition in the art. The claims fail to define what constitutes a "level" objectively (e.g., is it a percentage, a ratio, a boolean flag, or a continuous vector weight?). The lack of an objective metric renders the scope of the claim unclear. (MPEP § 2173.05(b)).
Claims 3, 10, and 17 (Subjective Terminology / Terms of Degree): The claims recite that defining levels of explainability "incentivizes the machine learning model to utilize at least a first feature... to a greater extent than a second feature." The term "incentivizes" is subjective. A machine learning algorithm consists of mathematical weights, penalties, and loss functions; it does not experience "incentives" or "motivations." Because the claim relies on a subjective human concept rather than defining the specific computational constraint (e.g., mathematical weight or penalty) applied to the features, a Person of Ordinary Skill in the Art (POSITA) cannot ascertain the objective boundaries of an "incentivized" machine learning model. Furthermore, "to a greater extent" is a relative term of degree. The claim fails to provide an objective metric standard (e.g., a discrete numerical weight threshold or a gradient difference metric) to determine what constitutes a "greater extent" of utilization, rendering the scope unascertainable. (MPEP 2173.05(b)).
Claims 7, 14, and 20 (Unbounded Terms of Degree): The claims recite training that "balances a performance of the machine learning model and explainability... in favor of the performance or the explainability." The phrases "balances" and "in favor of" are subjective terms of degree. When a term of degree is used in a claim, the specification must provide some standard for measuring that degree. Performance (e.g., F1 score accuracy) and explainability (e.g., SHAP value similarities) are measured in completely different mathematical units. The claims fail to define a baseline from which "balance" is measured, nor do they define an objective mathematical threshold (e.g., reciting a specific weight or stating that the scalar \lambda > 1 as detailed in the specification) to establish when the algorithm mathematically crosses the threshold into being "in favor of" one arbitrary metric over the other. (MPEP 2173.05(b)).
Claim 8 (Grammatical Omission / Lack of Antecedent Basis): Claim 8 recites "training, by a system operatively coupled to processor..." The term "processor" lacks the indefinite article "a" upon its introduction. This grammatical omission leaves it unclear whether the claim is referring to a generic processor or a specific processor not previously introduced in the claim. Applicant should amend the phrase to read "coupled to a processor." (MPEP § 2173.05(q) & (e)).
Dependent claims 9-14 do not cure the deficiencies of base claim 8 and thus claims 9-14 are also rejected under 35 U.S.C. 112(b) for at least being dependent on the rejected base claim 8.
Claim 15 (Preamble vs. Body Disconnect / Lack of Antecedent Basis): The preamble of Claim 15 recites "A computer program product for incorporating an explainability constraint into a decision tree ensemble..." However, the body of the claim fails to provide antecedent basis for or further reference the "decision tree ensemble," instead directing the processor to broadly "train, by the processor, a machine learning model." It is unclear whether the "machine learning model" trained in the body of the claim is strictly limited to the "decision tree ensemble" recited in the preamble, or if the preamble is merely a non-limiting statement of intended use while the body covers any generic machine learning model (such as a deep neural network). This structural disconnect leaves the metes and bounds of the claim ambiguous. (MPEP § 2173.05(e)).
Dependent claims 16-20 do not cure the deficiencies of base claim 15 and thus claims 16-20 are also rejected under 35 U.S.C. 112(b) for at least being dependent on the rejected base claim 15.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidance, 84 Fed. Reg. 50 (“2019 PEG”).
Step 1: Statutory Category
The claims are directed to a System (Claims 1-7), a Method (Claims 8-14), and a Computer Program Product (Claims 15-20). (Examiner's Note: While Claim 15 omits the term "non-transitory," the Examiner acknowledges that the specification expressly acts as its own lexicographer in paragraphs [0121] and [0137], disclaiming transitory signals per se. Therefore, Claim 15 is construed as a statutory non-transitory computer-readable storage medium). All claims fall within statutory categories.
Step 2A, Prong 1: Judicial Exception
The claims are directed to an abstract idea. Specifically, Independent Claims 1, 8, and 15 recite "training a machine learning model using an objective function that is modified to incorporate an explainability metric... in addition to a model performance metric."
These limitations recite Mathematical Concepts and a Mental Processes (MPEP § 2106.04(a)). Modifying a mathematical "objective function" to jointly compute performance metrics and explainability metrics inherently involves calculating mathematical formulas, algorithms, and numerical relationships. Furthermore, evaluating "explainability" and balancing "feature preferences" to make a model understandable corresponds to human evaluation and logical reasoning, which is a mental process. Translating human cognitive preferences into numerical variables is an abstract concept. Therefore, the claims recite an abstract idea.
Step 2A, Prong 2: Practical Application
The claims do not include additional elements that integrate the abstract idea into a practical application.
The Examiner has evaluated the claims in light of the Ex Parte Desjardins guidance (revising MPEP §§ 2106.04(d) and 2106.05(a)). In Desjardins, claims directed to training an ML model were found to improve the functioning of the computer because they explicitly recited how the parameters were adjusted to overcome a specific technical problem (catastrophic forgetting). The updated MPEP guidance emphasizes that an eligible claim "must include the components or steps of the invention that provide the improvement described in the specification" and explicitly warns against evaluating claims at an impermissibly "high level of generality."
Here, Independent Claims 1, 8, and 15 are drafted at an impermissibly high level of generality. They merely claim the result or idea of training a model using a modified objective function that incorporates an explainability metric. They fail to recite the specific computational steps, algorithms, mathematical equations, or structural components that actually execute this modification (i.e., how the objective function is mathematically formulated or executed to balance these metrics natively). Merely reciting the generic goal of "incorporating" an abstract metric is an instruction to apply an abstract idea (MPEP § 2106.05(f)) and does not reflect a technological improvement within the claims themselves.
The dependent claims similarly fail to integrate the abstract idea into a practical application:
Claims 2-4, 7, 9-11, 14, 16-18, and 20 merely add that "feature preferences" are defined (globally or locally) to "incentivize" or "balance" the model's reliance on specific features. These are simply further abstract idea statements of the desired mental process or mathematical goal. They do not recite the specific mathematical regularization terms, penalty matrices, or algorithms that actually provide the improvement.
Claims 5-6, 12-13, and 19 add "quantifying explainability... based on... training data and validation data" and "iteratively updates a feature preference vector." While slightly more detailed, these claims still merely recite standard, generic mathematical data-processing steps (comparing arrays and updating vectors) associated with further mathematical concept abstract ideas. They remain at a high level of generality and fail to recite the specific algorithms (e.g., the specific exponential moving average formulation comparing SHAP values between training and validation sets) described in the specification that actually achieve the technological improvement.
Because the claims as a whole are drafted at a high level of generality and fail to recite the specific technical components or steps that provide the disclosed improvement to machine learning technology, they do not integrate the abstract idea into a practical application.
Step 2B: Inventive Concept
The claims do not recite significantly more than the abstract idea. The claims recite generic computing hardware ("memory," "processor," "system") performing generic computer functions (storing, executing, training). The claims do not recite a specific, unconventional technological implementation, but rather apply the abstract idea using conventional computing components.
EXAMINER’S NOTE TO APPLICANT:
To overcome the 35 U.S.C. § 101 rejection under the Desjardins framework, Applicant is strongly advised to amend the independent claims to incorporate the specific computational steps, variables, and algorithmic formulas disclosed in the specification—such as those defining exactly how the objective function is modified, or how feature preference vectors are iteratively updated based on the quantified difference between training and validation explanations (e.g., SHAP value comparisons)—so that the claims reflect the actual technical improvement and are not drafted at a high level of generality. Furthermore, Applicant must resolve the indefinite, subjective terminology highlighted under the 35 U.S.C. § 112(b) rejection using objective technical language.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claims 1-4, 7-11, 14-18, and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Erion et al. (hereinafter Erion), “Improving performance of deep learning models with axiomatic attribution priors and expected gradients” (2020).
Erion was disclosed in an IDS dated 7-27-2023.
Regarding independent Claims 1, 8, and 15, Erion discloses a system comprising a memory, a processor, and executable components comprising a training component that (Page 14, Section 6.2 "Each model was trained on a single GPU on a desktop workstation with 4 Nvidia 1080 Ti GPUs" to modify the training objective function of a machine learning model to incorporate an explainability metric alongside a standard task performance metric; disclosing a computer-implemented framework executed on processors/memory (comprising a training component that)): trains a machine learning model using an objective function that is modified to incorporate an explainability metric for the machine learning model in addition to a model performance metric of the machine learning model (Page 2, Section 2.1 "Attribution priors are a flexible framework for encoding domain knowledge"), discloses replacing standard regularization with an "attribution prior," defined as a penalty function placed directly on feature attributions. The modified objective function is explicitly formulated at the top of Page 3 and Page 10 (Methods 1) as:$$\theta = \text{argmin}_\theta \mathcal{L}(\theta; X, y) + \lambda \Omega(\Phi(\theta, X))$$
Here, the loss function $\mathcal{L}(\theta; X, y)$ reads directly on the claimed "model performance metric", and the scalar-valued penalty function of the feature attributions $\Omega(\Phi(\theta, X))$ reads directly on the claimed "explainability metric"). Regarding the preamble of Claim 15 reciting incorporating an explainability constraint into a decision tree ensemble, Erion explicitly notes that its framework is "modular and agnostic to the particular attribution method" (Page 2) and cites its direct application in "explainable AI for trees" (Page 17, Ref [25]). The preamble is a statement of intended use and not structurally limiting, but even if construed as such, Erion anticipates its applicability.
Regarding dependent Claims 2, 9, and 16, Erion further discloses a definition component that (Page 14, Section 6.2 "Each model was trained on a single GPU on a desktop workstation with 4 Nvidia 1080 Ti GPUs" to modify the training objective function of a machine learning model to incorporate an explainability metric alongside a standard task performance metric; disclosing a computer-implemented framework executed on processors/memory (a definition component that)) defines feature preferences for the machine learning model, wherein defining the feature preferences comprises defining respective levels of explainability for one or more features of training data of the machine learning model (Erion defines "feature preferences" via attribution priors. On Page 1, Abstract, Erion states that these priors "optimize for a model whose attributions have certain desirable properties—most frequently, that particular features are important or unimportant." Defining specific features as important or unimportant mathematically establishes the "respective levels of explainability" required for those features during training. Erion further enforces this by defining a target adjacency matrix W that "encodes our prior belief about the pairwise similarity of the importances between two features" (Page 12, Section 3.2)).
Regarding dependent Claims 3, 10, and 17, Erion further discloses wherein the defining the respective levels of explainability (Erion defines "feature preferences" via attribution priors. On Page 1, Abstract, Erion states that these priors "optimize for a model whose attributions have certain desirable properties—most frequently, that particular features are important or unimportant." Defining specific features as important or unimportant mathematically establishes the "respective levels of explainability" required for those features during training) incentivizes the machine learning model to utilize at least a first feature of the training data to a greater extent than a second feature (Erion implements a "sparsity prior" that natively incentivizes this exact behavior. On Page 3, Section 2.1, Erion teaches defining a Gini coefficient penalty on attributions which "encourages a small number of features to have a large percentage of the total attribution while others are near-zero." By algorithmically penalizing reliance on confounding features (the second feature) to drive their attribution to near-zero, the objective function inherently incentivizes the model to utilize the unpenalized, relevant features (the first feature) to a greater extent).
Regarding dependent Claims 4, 11, and 18, Erion further discloses wherein the feature preferences are defined globally for the machine learning model or locally for one or more records of the training data (Erion discloses applying attribution priors at both the global and local levels. On Page 12, Section 3.2, Erion states: "In this case, we choose to penalize global rather than local feature attributions," defining $\bar{\phi}_i$ as the global importance of a feature across all samples. Conversely, on Page 11, Section 3.1, Erion defines $\phi_{i,j}^l$ as the specific "attribution for the i,j-th pixel in the $l$-th training image," reading on defining feature preferences locally for individual data records).
Regarding Dependent Claims 7, 14, and 20, Erion further discloses wherein training the machine learning model based on the model performance metric and the explainability metric (Page 1 Abstract discloses training deep network models with new method. The modified objective function is explicitly formulated at the top of Page 3 and Page 10 (Methods 1) as: $$\theta = \text{argmin}_\theta \mathcal{L}(\theta; X, y) + \lambda \Omega(\Phi(\theta, X))$$. Here, the loss function $\mathcal{L}(\theta; X, y)$ reads directly on the claimed "model performance metric", and the scalar-valued penalty function of the feature attributions $\Omega(\Phi(\theta, X))$ reads directly on the claimed "explainability metric")) balances a performance of the machine learning model and explainability of the machine learning model in favor of the performance or the explainability (Erion controls the mathematical balance between task performance and explainability using a regularization scalar $\lambda$ (Page 2, Section 2.1). On Page 5, Section 2.3, Erion discusses sweeping the $\lambda$ parameter across values [10^-20, 10^-1] to achieve a deliberate "trade-off between robustness and accuracy." Furthermore, under Supplementary Material Page 7, Figure 5, Erion graphs this exact behavior, showing how shifting the $\lambda$ scalar balances the objective function in favor of model performance (low $\lambda$) or model explainability (high $\lambda$)).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 5-6, 12-13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Erion, as applied in the rejection of claims 1,8, and 15, in view of .
Regarding dependent Claims 5 and 12, Erion teaches a computation component that (Page 14, Section 6.2 "Each model was trained on a single GPU on a desktop workstation with 4 Nvidia 1080 Ti GPUs" to modify the training objective function of a machine learning model to incorporate an explainability metric alongside a standard task performance metric; disclosing a computer-implemented framework executed on processors/memory (a computation component that)) quantifies explainability of the machine learning model based on the objective function (Pages 2-3, Section 2.1, modifying the objective function to incorporate an "attribution prior," defined as a scalar-valued penalty function of the feature attributions, formulated as $\theta = \text{argmin}_\theta \mathcal{L}(\theta; X, y) + \lambda \Omega(\Phi(\theta, X))$, where the term $\Omega(\Phi(\theta, X))$ mathematically quantifies the model's explainability directly within the objective function),
Erion does not expressly teach quantifying explainability of the machine learning model based on training data of the machine learning model and validation data of the machine learning model during a training loop.
However, Han teaches quantifying the explainability consistency of a machine learning model across different data variations to measure explanation generalization during a training loop (Han teaches on Pages 7640-7641, "Proposed ECT Framework", that models often suffer because "consistent model outputs come from inconsistent gradient-based explanations." To solve this, Han introduces Explanation Consistency Training (ECT), which mathematically quantifies an explanation gap between original data and perturbed data via Equation 2: $E(x_u; \mathcal{I}, \theta) = ||\mathcal{I}(x_u;\theta) - \mathcal{I}(\mathcal{A}(x_u);\theta)||_2^2$. Furthermore, Han teaches utilizing a standard hold-out validation dataset during this training process to evaluate model generalization, stating on Page 7643, "Datasets": "1,000 data for validation").
Because Erion and Han address the issue of quantifying explainability during machine learning model training, accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of quantifying explainability consistency across different data inputs during a training loop as suggested by Han into Erion’s system, with a reasonable expectation of success, to adapt Erion’s objective function to dynamically compute the mathematical gap between explanations generated on a training dataset versus a hold-out validation dataset, applying the explanation consistency principles of Han. Applying this consistency metric across standard training and validation datasets, and utilizing ubiquitous iterative optimization techniques (e.g., an Exponential Moving Average algorithm) to stabilize the feature preference vector across epochs, would render obvious quantifies explainability of the machine learning model based on the objective function, training data of the machine learning model and validation data of the machine learning model. This modification would have been motivated by the desire to solve the known technical vulnerability of "explanation overfitting," wherein a model memorizes the "right reasons" for the training data but fails to generalize that reasoning to unseen data distributions (Han, Page 7644 "Overfitting Prevention": "Biases are the distracting factors of true evidence, which fit the training data well while fail to generalize...") by using Explanation Consistency Training to ensure robust interpretability across data boundaries.
Regarding dependent Claims 6 and 13, Erion, in view of Han, further teach an update component that…the computation component (see Erion Page 14, Section 6.2 "Each model was trained on a single GPU on a desktop workstation with 4 Nvidia 1080 Ti GPUs" to modify the training objective function of a machine learning model to incorporate an explainability metric alongside a standard task performance metric; disclosing a computer-implemented framework executed on processors/memory (an update component that…the computation component)) iteratively updates a feature preference vector corresponding to the explainability of the machine learning model based on quantifying of the explainability of the machine learning model, wherein the updating is iterative (see Han Page 7641, Equation 2, incorporating the quantified explanation consistency gap into the overall optimization objective function, and teaches on Page 7643, "Performance on Benchmarks" and Figure 4, minimizing this joint loss iteratively using Stochastic Gradient Descent (SGD). Specifically, Han discloses: "All the models are trained for 50 epochs... so it is close to 30,000 batch iterations." By minimizing the joint objective function via gradient descent across these training iterations, the optimizer dynamically calculates the explanation error gradient based on the quantified explanation consistency gap and iteratively updates the model's internal parameters $\theta$. Updating these parameters inherently and iteratively updates the model's local feature attributions [reading directly on the claimed "feature preference vector"] to align the explanations and minimize the quantified gap).
Regarding dependent claim 19, it is a computer program product claim that is substantially the same subject matter as the combination of the method of claims 12 and 13. Thus, claim 19 is rejected for similar reasons as the combination of claims 12 and 13.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KUANG FU CHEN whose telephone number is (571)272-1393. The examiner can normally be reached M-F 9:00-5:30pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached on (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/KC CHEN/Primary Patent Examiner, Art Unit 2143