Office Action Analysis: 18217091 — COUNTERFACTUAL PREDICTION AND INTERPRETABLE POLICY LEARNING FROM OBSERVATIONAL DATA USING PRESCRIPTIVE RELU NETWORKS

Office Action

§101 §103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Information Disclosure Statement The information disclosure statement(s) (IDS) submitted on June 30, 2023 and February 2, 2026 is/are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement(s) is/are being considered by the examiner. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C.101 because the claimed invention is directed to an abstract idea without significantly more. Step 1: Claims 1-8 are directed to a process. Claims 9-20 are directed to a machine or an article of manufacture. With respect to claim(s) 1, 9, and 15: 2A Prong 1 : The claim(s) recite(s) an abstract idea. Specifically: creating/create […] a prescriptive tree based on the ANN model, wherein each leaf node of the prescriptive tree corresponds to one of the treatment options, and wherein the prescriptive tree is configured to indicate one of the treatment options for a particular set of features of the covariate data . ( Mental process – A person can create a prescriptive tree wherein each leaf node corresponds to a treatment option using a pen and paper – see MPEP § 2106.04(a)(2)(III)) If claim limitations, under their broadest reasonable interpretation, cover performance of the limitations as a mental process, but for the recitation of generic computer components, then the claim limitations fall within the mathematical or mental process grouping of abstract ideas. Accordingly, the claim “recites” an abstract idea. 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: (Claim 1) A computer-implemented method, comprising : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) training/train […] an artificial neural network (ANN) model using a dataset comprising observational data including treatment data, outcome data, and covariate data, wherein the ANN model includes rectified linear unit ( ReLU ) activation functions and K number of output nodes corresponding to K number of treatment options; and (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 1) by the processor set (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 9) A computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 15) A system comprising : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 15) a processor set, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea. 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: (Claim 1) A computer-implemented method, comprising : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) training/train […] an artificial neural network (ANN) model using a dataset comprising observational data including treatment data, outcome data, and covariate data, wherein the ANN model includes rectified linear unit ( ReLU ) activation functions and K number of output nodes corresponding to K number of treatment options; and (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 1) by the processor set (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 9) A computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 15) A system comprising : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claim 15) a processor set, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to : (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible . With respect to claim(s) 2, 3, 10, and 16: 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: (Claims 2, 10, and 16) wherein the training the ANN model comprises using a loss function that is based on prescription outcome and prediction error . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claims 3, 10, and 16) wherein the training the ANN model comprises adjusting values of weights of the ANN model using the loss function and gradient descent . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: (Claims 2, 10, and 16) wherein the training the ANN model comprises using a loss function that is based on prescription outcome and prediction error . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claims 3, 10, and 16) wherein the training the ANN model comprises adjusting values of weights of the ANN model using the loss function and gradient descent . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible. Therefore, the claim is not patent eligible . With respect to claim(s) 4, 11, and 17: 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: wherein the prescriptive tree comprises an oblique tree with hyperplane splits created by using multiple weights per neuron in the ANN model . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: wherein the prescriptive tree comprises an oblique tree with hyperplane splits created by using multiple weights per neuron in the ANN model . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible. Therefore, the claim is not patent eligible . With respect to claim(s) 5, 12, and 18: 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: wherein the prescriptive tree comprises an axis-aligned tree created by setting a single weight per neuron in the ANN model . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: wherein the prescriptive tree comprises an axis-aligned tree created by setting a single weight per neuron in the ANN model . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible. Therefore, the claim is not patent eligible . With respect to claim(s) 6, 13, and 19: 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: wherein the ANN model takes a number of non-zero weights connected to each neuron as an input parameter . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: wherein the ANN model takes a number of non-zero weights connected to each neuron as an input parameter . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible. Therefore, the claim is not patent eligible . With respect to claim(s) 7: 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: wherein, at each epoch during the training, the ANN model retains only a subset of weights per neuron . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: wherein, at each epoch during the training, the ANN model retains only a subset of weights per neuron . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible. Therefore, the claim is not patent eligible . With respect to claim(s) 8, 14, and 20: 2A Prong 2 : The additional elements recited in the claim(s) do not integrate the abstract idea into a practical application, individually or in combination. Additional elements: (Claim 8) further comprising incorporating one or more constraints in the ANN model . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claims 14 and 20) wherein the program instructions are executable to incorporate one or more constraints in the ANN model (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) 2B : The claim(s) do(es) not include additional elements that are sufficient to amount to significantly more than the judicial exception. Additional elements: (Claim 8) further comprising incorporating one or more constraints in the ANN model . (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) (Claims 14 and 20) wherein the program instructions are executable to incorporate one or more constraints in the ANN model (Mere instructions to apply an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea – see MPEP 2106.05(f).) Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible. Therefore, the claim is not patent eligible . Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 9, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over BIGGS (US 20220180168 A1) in view of SCHWAB ("Learning to treat, explain and diagnose with neural networks"), hereafter BIGGS and SCHWAB respectively. Regarding Claim 1: BIGGS teaches: A computer-implemented method, comprising : (BIGGS [0089] teaches: "The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process (i.e., computer-implemented method ), such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.") training, by a processor set, an artificial neural network (ANN) model using a dataset comprising observational data including treatment data, […] and covariate data (BIGGS [0032] teaches: "the teacher model is a highly complex black box machine learning model, such as a neural network (i.e., artificial neural network (ANN) model ). For expository purposes, the terms “predictive model” and “teacher model” are used interchangeably in this specification." BIGGS [0050] teaches: "In one embodiment, a prescriptive tree and a teacher model are deployed for a healthcare setting involving personalized/precision medicine. Both models are trained based on publicly available patient datasets (e.g., Consortium 2009) which contain true patient-specific optimal doses (i.e., treatment data ) of a particular medicine, and also include patient-level covariates such as clinical factors, demographic variables, and genetic information (i.e., covariate data ). [...] The system 330 is configured to train a teacher model based on the patient datasets (i.e., training [...] using a dataset comprising observational data ), resulting in a trained teacher model that predicts success probability of a dosage given a patient's covariates." BIGGS [0021] teaches: "The system comprises at least one processor, and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations." BIGGS [0079] teaches: "The computer system includes one or more processors (i.e., a processor set ), such as processor 702") creating, by the processor set, a prescriptive tree based on the ANN model, wherein each leaf node of the prescriptive tree corresponds to one of the treatment options, and wherein the prescriptive tree is configured to indicate one of the treatment options for a particular set of features of the covariate data . (BIGGS [0034] teaches: "In the training phase, the prescriptive model training unit 430 is configured to: (1) receive, as input, training data 410, and (2) train a prescriptive model 435 for segmentation based on the training data 410. In one embodiment, the prescriptive model 435 is trained using a specialized tree algorithm, resulting in a prescriptive tree (i.e., creating [...] a prescriptive tree ) including a root node and one or more leaf nodes" BIGGS [0035] teaches: "In one embodiment, a path from the root node of the prescriptive tree to a particular leaf node of the tree specifies a particular segment of a population. In one embodiment, a leaf node of the prescriptive tree is prescribed a policy for a particular segment of a population (i.e., wherein each leaf node of the prescriptive tree corresponds to one of the treatment options ) specified by a path from the root node of the tree to the leaf node, wherein the policy is defined by a set of rules/items which produce the same action (i.e., wherein the prescriptive tree is configured to indicate one of the treatment options ), and the rules/items have similar covariates (i.e., for a particular set of features of the covariate data ). In one embodiment, a set of rules/items that have a similar optimal action, as evaluated by the predictive model 425, are selected to define a leaf node of the prescriptive tree." BIGGS [0036] teaches: "In one embodiment, the integrated segmentation performed is as follows: Each split of the prescriptive tree (e.g., on a feature of a product or a customer) separates data into two data sets. An estimated optimal action for each data set can be determined via the predictive model 425 (i.e., teacher model) (i.e., based on the ANN model ) which evaluates an expected outcome at each action, and chooses the optimal action.") However, BIGGS is not relied upon for teaching, but SCHWAB teaches: […] observational data including outcome data […] (SCHWAB [page 118, section 5.3 Problem Statement] teaches: "As training data, we receive factual samples X and their observed outcomes (i.e., outcome data ) y n,f ( s f ) after applying a specific observed treatment f at dosage s f . ") wherein the ANN model includes rectified linear unit ( ReLU ) activation functions and K number of output nodes corresponding to K number of treatment options ; (SCHWAB [page 100, section 4.7.F] teaches: "Each hidden layer was followed by a BN layer, dropout and a ReLU activation (i.e., ANN model includes rectified linear unit ( ReLU ) activation functions ).” SCHWAB [page 118, section 5.3 Problem Statement] teaches: "We consider a setting in which we are given N observed samples X with p pre-treatment covariates x i and i∈ 0.. p-1 . For each sample, the potential outcomes y n,t ( s t ) are the response of the n th sample to a treatment t out of the set of k available treatment options T= 0,…,k-1 applied at a dosage s t ∈ s t ∈R, a t &gt;0 | a t ≤s≤ b t , where a t and b t are the minimum and maximum dosage for treatment t , respectively." SCHWAB [page 120-121, section 5.3 Model Architecture] teaches: "Schwab et al. (2018b) extended the TARNET architecture to the multiple treatment setting by using k separate head networks, one for each treatment option." SCHWAB [page 123, Figure 5.1] teaches: "The dose response network (DRNet) architecture with shared base layers, k intermediary treatment layers, and k*E heads for the multiple treatment setting with an associated dosage parameter s (i.e., K number of output nodes corresponding to K number of treatment options ) . The shared base layers are trained on all samples, and the treatment layers are only trained on samples from their respective treatment category. Each treatment layer is further subdivided into E head layers (only one set of E=3 head layers for treatment t=0 is shown above). Each head layer is assigned a dosage stratum that subdivides the range of potential dosages a t , b t into E partitions of equal width (b-a)/E . The head layers each predict outcomes y s s for a range of values of the dosage parameter s , and are only trained on samples that fall within their respective dosage stratum.") Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS and SCHWAB before them, to include SCHWAB's observed outcomes in BIGGS' dataset for the prescriptive policy generation method. One would have been motivated to make such a combination in order to estimate optimal treatment policies when experimental data is not available (SCHWAB [page 12, section 1.3 Thesis Outline]). Regarding Claim 9: The claim recites similar limitations as corresponding claim 1 and is rejected for similar reasons as claim 1 using similar teachings and rationale. Additionally, BIGGS teaches: A computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to: (BIGGS [0021] teaches: "The system comprises at least one processor, and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations." BIGGS [0079] teaches: "The computer system includes one or more processors, such as processor 702." BIGGS [0022] teaches: "One embodiment of the invention provides a computer program product for integrated segmentation and prescriptive policies generation. The computer program product comprises a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to training a first AI model and a second model based on training data.") Regarding Claim 15: The claim recites similar limitations as corresponding claim 1 and is rejected for similar reasons as claim 1 using similar teachings and rationale. Additionally, BIGGS teaches: A system comprising: a processor set, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to : (BIGGS [0021] teaches: "The system comprises at least one processor, and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations." BIGGS [0079] teaches: "The computer system includes one or more processors, such as processor 702.” BIGGS [0022] teaches: "One embodiment of the invention provides a computer program product for integrated segmentation and prescriptive policies generation. The computer program product comprises a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to training a first AI model and a second model based on training data.") Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB as applied above to claim 1, and further in view of BERTSIMAS ("Optimal Prescriptive Trees"), hereafter BERTSIMAS. Regarding Claim 2: BIGGS in view of SCHWAB teaches the elements of claim 1 as outlined above. BIGGS in view of SCHWAB is not relied upon for teaching, but BERTSIMAS teaches: wherein the training the ANN model comprises using a loss function that is based on prescription outcome and prediction error . (BERTSIMAS [page 167, section 2. Review of Optimal Predictive Trees] teaches: “The optimal trees framework is a generic approach for training decision trees according to a loss function of the form […] where T is the decision tree being optimized […].” BERTSIMAS [page 168, section 3. Optimal Prescriptive Trees] teaches: "In this section, we motivate and present the OPT algorithm that trains prescriptive trees to directly minimize the objective presented in Problem (3) using a decision rule that takes the form of a prescriptive tree (that is, a decision tree that, in each leaf, prescribes a common treatment for all samples that are assigned to that leaf of the tree). Our approach is to estimate the counterfactual outcomes using this prescriptive tree during the training process and therefore, jointly optimize the prescription (i.e., prescription outcome ) and the prediction error .") Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, and BERTSIMAS before them, to include BERTSIMAS' joint optimization of prescription and prediction error in BIGGS and SCHWAB's prescriptive policy generation method. One would have been motivated to make such a combination in order for tree predictions to lead to a major improvement of the out-of-sample predictive and prescriptive errors (BERTSIMAS [page 169, section 3.1. Optimal Prescriptive Trees with Constant Predictions]). Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB and BERTSIMAS as applied to claim 2 above, and further in view of LEE ("Oblique Decision Trees from Derivatives of ReLU Networks"), hereafter LEE. Regarding Claim 3: BIGGS in view of SCHWAB and BERTSIMAS teaches the elements of claim 2 as outlined above. BIGGS further teaches: The computer-implemented method of claim 2, wherein the training the ANN model comprises adjusting values of weights of the ANN model using the loss function […] (BIGGS [0043] teaches: "the predictive model 425 is a teacher model (i.e., which determines the loss function to facilitate the finetuning/adjusting/updating of the prescriptive model 435).") BIGGS in view of SCHWAB and BERTSIMAS is not relied upon for teaching, but LEE teaches: […] adjusting values of weights of the ANN model using […] gradient descent . (LEE [page 8, section 3. 7 ] teaches: "Both models are optimized via stochastic gradient descent.") Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, BERTSIMAS, and LEE before them, to include LEE's gradient descent in BIGGS, SCHWAB, and BERTSIMAS' prescriptive policy generation method. One would have been motivated to make such a combination in order to optimize the model and adopt many tools developed for deep networks while implicitly training decision trees (LEE [page 8, section 3.6]). Claims 4, 11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB as applied respectively above to claims 1, 9, and 15, and further in view of LEE. Regarding Claim 4: BIGGS in view of SCHWAB teaches the elements of claim 1 as outlined above. BIGGS in view of SCHWAB is not relied upon for teaching, but LEE teaches: wherein the prescriptive tree comprises an oblique tree with hyperplane splits created by using multiple weights per neuron in the ANN model . (LEE [page 1, Abstract] teaches: "Indeed, only M neurons suffice to implicitly model an oblique decision tree with 2 M leaf nodes." LEE [page 2, section 2 Related Work] teaches: "for a single oblique split, there can be k=0 D N k different ways to separate N data points in D -dimensional space [...]." LEE [page 3, section 3.1 Notation and Basics] teaches: “The neurons are defined via the weight matrix W i ∈ R N i × N i-1 (i.e., using multiple weights per neuron ) and the bias vector b i ∈ R N i in each layer i∈ M ≜ 1,2,…, M .” LEE [page 4, Figure 1] teaches: "Toy examples for the equivalent representations of the same mappings for different M . Here the locally constant networks have 1 neuron per layer. We show the locally constant networks on the LHS, the raw mappings in the middle, and the equivalent oblique decision trees on the RHS." Examiner's note: each neuron z i is defined using a weight matrix.) Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, and LEE before them, to include LEE's oblique tree modeling in BIGGS and SCHWAB's prescriptive policy generation method. One would have been motivated to make such a combination in order to adopt many tools developed for deep networks while implicitly training decision trees (LEE [page 1, Abstract]). Regarding Claim 11: BIGGS in view of SCHWAB teaches the elements of claim 9 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale. Regarding Claim 17: BIGGS in view of SCHWAB teaches the elements of claim 15 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 4 and is rejected for similar reasons as claim 4 using similar teachings and rationale. Claims 5, 12, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB as applied respectively above to claims 1, 9, and 15, and further in view of RICHMOND ("Mapping Auto-context Decision Forests to Deep ConvNets for Semantic Segmentation"), hereafter RICHMOND. Regarding Claim 5: BIGGS in view of SCHWAB teaches the elements of claim 1 as outlined above. BIGGS further teaches: wherein the prescriptive tree comprises an axis-aligned tree […] (BIGGS [0034] teaches: "In one embodiment, the prescriptive model 435 is trained using a specialized tree algorithm, resulting in a prescriptive tree including a root node and one or more leaf nodes." BIGG [0036] teaches: "In one embodiment, the prescriptive model 435 performs integrated segmentation which comprises constructing a decision tree with a customized/user-defined splitting criterion (e.g., expected revenue maximization) which optimizes a desired outcome for a given action. In one embodiment, the integrated segmentation performed is as follows: Each split of the prescriptive tree (e.g., on a feature of a product or a customer) separates data into two data sets." Examiner's note: Under broadest reasonable interpretation, an axis-aligned tree can be interpreted as a tree that is constructed by a splitting criterion such as, for example, expected revenue maximization, which is a single value resulting in an axis-aligned split of the data.) BIGGS in view of SCHWAB is not relied upon for teaching, but RICHMOND teaches: […] an axis-aligned tree created by setting a single weight per neuron in the ANN model . (RICHMOND [page 4, section 3 Method] teaches: "This can model axis-aligned split functions with a single non-zero weight per neuron […].”) Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, and RICHMOND before them, to include RICHMOND’s axis-aligned split function with a single non-zero weight per neuron in BIGGS and SCHWAB’s prescriptive policy generation method. One would have been motivated to make such a combination in order to take advantage of mapping from a stacked Random Forest (RF) to deep ConvNet back to a stacked RF with updated parameters, which leads to superior results on semantic segmentation with limited training samples, compared to alternative strategies (RICHMOND [page 2, section 1. Contributions]). Regarding Claim 12: BIGGS in view of SCHWAB teaches the elements of claim 9 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale. Regarding Claim 18: BIGGS in view of SCHWAB teaches the elements of claim 15 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale. Claims 6, 8, 13-14, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB as applied respectively above to claims 1, 9, and 15, and further in view of LASBY ("Dynamic Sparse Training with Structured Sparsity"), hereafter LASBY. Regarding Claim 6: BIGGS in view of SCHWAB teaches the elements of claim 1 as outlined above. BIGGS in view of SCHWAB is not relied upon for teaching, but LASBY teaches: wherein the ANN model takes a number of non-zero weights connected to each neuron as an input parameter . (LASBY [page 3, section 3. Method] teaches: "Constant fan-in represents a special case of N:M sparsity where N is the number of non-zero weights per neuron and M is the dense fan-in for each neuron within a given layer." LASBY [page 4, section 3.1. sparsity and Output-Norm Variance] teaches: "In contrast, the constant-fan-in type imposes a strong structural constraint." LASBY [page 1, Figure 1] teaches: "A constant fan-in weight matrix has the same number of non-zero elements (here 2) per column allowing condensed representation." Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, and LASBY before them, to include LASBY’s constant fan-in in BIGGS and SCHWAB’s prescriptive policy generation method. One would have been motivated to make such a combination in order to enable a compact representation that is not only parameter- and memory-efficient, but also amenable to real-world acceleration (LASBY [page 2, section 1. Introduction]). Regarding Claim 8: BIGGS in view of SCHWAB teaches the elements of claim 1 as outlined above. BIGGS in view of SCHWAB is not relied upon for teaching, but LASBY teaches: incorporating one or more constraints in the ANN model. (LASBY [page 4, section 3.1. sparsity and Output-Norm Variance] teaches: "In contrast, the constant-fan-in type imposes a strong structural constraint. Therefore we are somewhat surprised to find that, in fact, constant-fan-in sparsity always produces slightly smaller output-norm variance than the other types. The difference is larger when k n, i.e., for very sparse networks. This indicates that, at the very least, the constant fan-in constraint should not impair SNN (e.g., sparse neural network) training dynamics and performance, motivating our method of maintaining the constant fan-in sparsity constraint within a DST approach.") Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, and LASBY before them, to include LASBY’s constant fan-in in BIGGS and SCHWAB’s prescriptive policy generation method. One would have been motivated to make such a combination in order to enable a compact representation that is not only parameter- and memory-efficient, but also amenable to real-world acceleration (LASBY [page 2, section 1. Introduction]). Regarding Claim 13: BIGGS in view of SCHWAB teaches the elements of claim 9 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale. Regarding Claim 14: BIGGS in view of SCHWAB teaches the elements of claim 9 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale Regarding Claim 19: BIGGS in view of SCHWAB teaches the elements of claim 15 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale. Regarding Claim 20: BIGGS in view of SCHWAB teaches the elements of claim 15 as outlined above. Additionally, the claim recites similar limitations as corresponding claim 8 and is rejected for similar reasons as claim 8 using similar teachings and rationale. Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB as applied above to claim 1, and further in view of SUN (US 20210406654 A1), hereafter SUN. Regarding Claim 7: BIGGS in view of SCHWAB teaches the elements of claim 1 as outlined above. BIGGS in view of SCHWAB is not relied upon for teaching, but SUN teaches: wherein, at each epoch during the training, the ANN model retains only a subset of weights per neuron . (SUN [0105] teaches: "For example, if the pruning iteration number is set to one, the modified weights in the 1×1 sparse weight cubes CB and WS are pruned (i.e., retains only a subset of weights per neuron ) after every epoch of training (i.e., at each epoch during the training ) images." SUN [0107] teaches: "Although the invention has been described in terms of a CNN stage in a neural network, the mechanism is not limited to natural language and vision models. The same mechanism can be applied to other types of models.") Accordingly, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention, having the teachings of BIGGS, SCHWAB, and SUN before them, to include SUN’s pruning in BIGGS and SCHWAB’s prescriptive policy generation method. One would have been motivated to make such a combination in order to use pruning to create sparse weights in order to improve model accuracy without increasing computational cost (SUN [0099]). Claims 10 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over BIGGS in view of SCHWAB as applied respectively above to claims 9 and 15, and further in view of BERTSIMAS and LEE. Regarding Claim 10: BIGGS in view of SCHWAB teaches the elements of claim 9 as outlined above. Additionally, the claim recites similar limitations as corresponding claims 2 and 3 and is rejected for similar reasons as claims 2 and 3 using similar teachings and rationale. Regarding Claim 16: BIGGS in view of SCHWAB teaches the elements of claim 15 as outlined above. Additionally, the claim recites similar limitations as corresponding claims 2 and 3 and is rejected for similar reasons as claims 2 and 3 using similar teachings and rationale. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT Alvaro S Laham Bauzo whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-5650 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT Mon-Fri 7:30 AM - 11:00 AM | 1:00 PM - 5:30 PM ET . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on FILLIN "SPE Phone?" \* MERGEFORMAT (571) 272-4046 . The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /A.S.L./ Examiner, Art Unit 2146 /USMAAN SAEED/ Supervisory Patent Examiner, Art Unit 2146
Read full office action
COUNTERFACTUAL PREDICTION AND INTERPRETABLE POLICY LEARNING FROM OBSERVATIONAL DATA USING PRESCRIPTIVE RELU NETWORKS

This examiner grants 33% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

COUNTERFACTUAL PREDICTION AND INTERPRETABLE POLICY LEARNING FROM OBSERVATIONAL DATA USING PRESCRIPTIVE RELU NETWORKS

This examiner grants 33% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email