Last updated: May 29, 2026
Application No. 18/113,267
Interpretable Anomaly Detection By Generalized Additive Models With Neural Decision Trees

Non-Final OA §103
Filed
Feb 23, 2023
Priority
Feb 28, 2022 — provisional 63/314,608
Examiner
STANLEY, JEREMY L
Art Unit
2127
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
1 (Non-Final)
This examiner grants 48% of cases after interview

— +44.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 278 resolved cases, 2023–2026
Examiner Intelligence

STANLEY, JEREMY L View full profile →
Grants 48% of resolved cases
Career Allowance Rate
133 granted / 278 resolved
-7.2% vs TC avg
Strong +44% interview lift
Without
With
+44.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
15 currently pending
Career history
305
Total Applications
across all art units
Statute-Specific Performance

§101
0.3%
-39.7% vs TC avg
§103
95.5%
+55.5% vs TC avg
§102
2.9%
-37.1% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 278 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Application filed on February 23, 2023.  Claims 1-20 are pending in the case.  Claims 1, 8, and 15 are the independent claims.  
This action is non-final.

Claim Rejections – 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims under pre-AIA  35 U.S.C. 103(a), the examiner presumes that the subject matter of the various claims was commonly owned at the time any inventions covered therein were made absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and invention dates of each claim that was not commonly owned at the time a later invention was made in order for the examiner to consider the applicability of pre-AIA  35 U.S.C. 103(c) and potential pre-AIA  35 U.S.C. 102€, (f) or (g) prior art under pre-AIA  35 U.S.C. 103(a).
Claims 1, 2, 5, 7-9, 12, 14-16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Rudolph et al. (US 20230259785 A1), in view of Chun-Hao Chang, Sarah Tan, Ben Lengerich, Anna Goldenberg, Rich Caruana.  How Interpretable and Trustworthy are GAMs? June 7, 2021. Arxiv.org.  arXiv:2006.06466v2.  (hereinafter Chang), further in view of Carreira-Perpinan (US 20220318641 A1), further in view of Ding et al. (US 20220058531  A1).
With respect to claims 1, 8, and 15, Rudolph teaches a system comprising: one or more processors, the one or more processors configured to perform a method; one or more non-transitory computer-readable storage media storing instructions that are operable, when executed by one or more processors, to cause the one or more processors to perform operations comprising the method (e.g. paragraph 0108, processor configured to read into memory and execute computer-executable instructions residing in non-volatile storage embodying ML algorithms and methodologies of described embodiments); and the method, comprising: 
initializing, by one or more processors, a model (e.g. paragraph 0045, describing implementation of model training system using a model 102; model receiving data; i.e. a model is created/initialized, such as in order to perform training of the model); and 
training, by the one or more processors, the model to receive tabular data as input and to generate an anomaly score (e.g. paragraph 0037, described approach applied to anomaly detection, including with tabular data; paragraph 0046, Fig. 1B, showing how trained model 102  is used to receive a sample, process it, and compute an anomaly score; paragraph 0065, tabular anomaly detection methods; paragraph 0077, use of tabular data for anomaly detection), wherein in training the model the one or more processors are configured to: 
training the model using unlabeled data and a loss function (e.g. paragraph 0033, describing anomaly training approach using dataset of normal samples and minimizing a loss function; paragraph 0036, when updating model parameters, using combination of two losses to optimally exploit learning signal; even unlabeled anomalies provide valuable training signals; paragraph 0045, Fig. 1A, describing model training system; in first step, model receives data set of N samples that includes normal and unidentified (i.e. unlabeled) anomalous data samples; model processes the data via the model to produce an anomaly score associated with each sample in the data set, and ranks the normal and anomalous data samples according to the anomaly scores; the system them labels a fraction of the samples with the highest scores with an anomaly label and the remaining samples with a normal label; training model includes use of joint loss function); and 
training the model using labeled data (e.g. paragraph 0045, Fig. 1A, labels are then passed together with the data samples to update the system which consists of retraining the model using all N samples, the labels, and a joint loss function).
Rudolph does not explicitly disclose:
the model is a generalized additive model (GAM), the GAM comprising one or more neural decision trees; 
training the model to generate an explanation of the anomaly score.
However, Chang teaches:
the model is a generalized additive model (GAM), the GAM comprising one or more neural decision trees (e.g. page 2, second column, final paragraph-page 3, first column, first paragraph, describing implementations for tree-based GAMs); 
training the model to generate an explanation of the anomaly score (e.g. page 2 first column, bulleted list, indicating that GAMs are used to learn explanations and perform data anomaly discovery; page 2, second column, final two paragraphs, indicating that GAMs are interpretable because the impact of each feature on the prediction can be visualized; describing tree-based GAMs which sequentially considers each feature as an explanation of the current residual; page 4, first column, first paragraph, indicating that GAMs are commonly used on interpretability tasks such as discovering anomalies in data; page 5 second column, second paragraph, localized data anomalies detected by tree-based GAM methods; page 9 second column, first paragraph, tree-based GAM methods superior for considering bias and data anomaly discovery; i.e. where the tree-based GAM is utilized to predict/identify anomalies in the data and is further able to provide corresponding explanations, such as in the form of a visualization of the impact of various features on the prediction, etc.).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph and Chang in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), to incorporate the teachings of Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection) to implement the model as a tree-based GAM which is trained to generate explanations for the anomaly scores (as taught by Chang).  One of ordinary skill would have been motivated to perform such a modification in order to attain the best balance of sparsity, fidelity, and accuracy in the model for anomaly detection as described in Chang (abstract).
Rudolph and Chang do not explicitly disclose the loss function measuring the sparsity of data represented by leaves of the one or more neural decision trees.  However, Carreira-Perpinan teaches the loss function measuring the sparsity of data represented by leaves of the one or more neural decision trees (e.g. paragraph 0059, objective function made up of loss and regularization term; paragraph 0223, describing loss function in leaf/node; paragraphs 0398-0401, controlling sparsity of nodes in tree using regularization term in tree objective function which permits control of overall sparsity budget of the tree and distribution of the sparsity budget over the nodes (i.e. leaves) of the tree, using a sparsity penalty; compare with specification of the instant application at paragraph 0023, which indicates that the loss function can also be the objective function).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, and Carreira-Perpinan in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection) and Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), to incorporate the teachings of Carreira-Perpinan (directed to optimization of learning decision trees) to utilize in the loss/objective function, one or more measurements of sparsity of data of the leaves/nodes of the decision tree (as taught by Carreira-Perpinan).  One of ordinary skill would have been motivated to perform such a modification in order to learn better decision trees than with traditional algorithms as described in Carreira-Perpinan (abstract).
Rudolph, Chang, and Carreira-Perpinan do not explicitly disclose the trees comprising leaves that are differentiable with respect to weight parameters for the GAM.  However, Ding teaches the trees comprising leaves that are differentiable with respect to weight parameters for the GAM (e.g. paragraph 0106, describing a soft decision tree (SDT) in which each node defines a probabilistic decision boundary with the sigmoid function such that the SDT is a differentiable model with non-zero gradients; each node in SDT represented as a weight vector with terms indicating the index of the layer and index of the node, as shown in Fig. 3; compare with specification of the instant application at paragraph 0035, indicating that that differentiable decision trees are differentiable with respect to weight parameters of the GAM).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Carreira-Perpinan, and Ding in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), and Carreira-Perpinan (directed to optimization of learning decision trees), to incorporate the teachings of Ding (directed to cascading decision trees for explainable reinforcement learning) to utilize, as the decision trees (i.e. within the GAM of Chang), differentiable decision trees with nodes/leaves that are differentiable with respect to weight parameters (as taught by Ding, i.e. where the weight parameters are weight parameters of a GAM, when the decision trees are implemented within a GAM as taught by Chang).  One of ordinary skill would have been motivated to perform such a modification in order to aid in providing explainable machine learning architectures, using tree-based approaches which provide improved computational performance and explainability as described in Ding (abstract, paragraph 0009).
With respect to claims 7 and 14, Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding teaches all of the limitations of claims 1 and 8 as previously discussed, and Rudolph further teaches wherein the method further comprises: 
receiving, by one or more processors, one or more inputs for the model  (e.g. paragraph 0046, Fig. 1B, showing how trained model 102  is used to receive a sample); and 
generating, by the one or more processors, for each of the one or more inputs, a respective anomaly score (e.g. paragraph 0037, described approach applied to anomaly detection, including with tabular data; paragraph 0046, Fig. 1B, showing how trained model 102  is used to process the received sample and compute an anomaly score; paragraph 0065, tabular anomaly detection methods; paragraph 0077, use of tabular data for anomaly detection).
As discussed above, Chang further teaches that the model is a GAM (e.g. page 2, second column, final paragraph-page 3, first column, first paragraph, describing implementations for tree-based GAMs), and generating respective one or more explanations for the respective anomaly score (e.g. page 2 first column, bulleted list, indicating that GAMs are used to learn explanations and perform data anomaly discovery; page 2, second column, final two paragraphs, indicating that GAMs are interpretable because the impact of each feature on the prediction can be visualized; describing tree-based GAMs which sequentially considers each feature as an explanation of the current residual; page 4, first column, first paragraph, indicating that GAMs are commonly used on interpretability tasks such as discovering anomalies in data; page 5 second column, second paragraph, localized data anomalies detected by tree-based GAM methods; page 9 second column, first paragraph, tree-based GAM methods superior for considering bias and data anomaly discovery; i.e. where the tree-based GAM is utilized to predict/identify anomalies in the data and is further able to provide corresponding explanations, such as in the form of a visualization of the impact of various features on the prediction, etc.).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Carreira-Perpinan, Ding,  and Chang in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Carreira-Perpinan (directed to optimization of learning decision trees), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection) to implement the model as a tree-based GAM which is trained to generate explanations for the anomaly scores (as taught by Chang).  One of ordinary skill would have been motivated to perform such a modification in order to attain the best balance of sparsity, fidelity, and accuracy in the model for anomaly detection as described in Chang (abstract).
With respect to claims 2, 9, and 16, Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding teaches all of the limitations of claims 1, 8, and 15 as previously discussed.  As previously discussed, Rudolph teaches training the model using the unlabeled data and a loss function (e.g. as cited above with respect to the independent claims).  In addition, as noted above, Chang teaches that the model is a GAM (as cited above with respect to the independent claims). 
Carreira-Perpinan further teaches training the model using a loss function measuring the sparsity of data represented by leaves of the one or more neural decision trees (e.g. paragraphs 0398-0401, controlling sparsity of nodes in tree using regularization term in tree objective function which permits control of overall sparsity budget of the tree and distribution of the sparsity budget over the nodes (i.e. leaves) of the tree, using a sparsity penalty; compare with specification of the instant application at paragraph 0023, which indicates that the loss function can also be the objective function) comprises: 
estimating the sparsity of data currently represented by leaves of the one or more neural decision trees (e.g. paragraph 0125, each node contains its own reduced set; paragraphs 0361-0363, describing inexact node optimization, where solving a subproblem approximately rather than exactly can reduce runtime, such as in node optimization to limit the number of passes over the reduced set, such as by using stochastic optimization and/or trying only a small sample of bias values; paragraph 0385, using sparse optimization algorithms to solve optimization problems at node/leaf; paragraph 0397, size of reduced set at given node changes over iterations, such that effect or regularization term is inversely proportional to the number of instances in the reduced set; paragraphs 0398-0401, controlling sparsity of nodes in tree using regularization term in tree objective function which permits control of overall sparsity budget of the tree and distribution of the sparsity budget over the nodes (i.e. leaves) of the tree, using a sparsity penalty; i.e. for any given iteration the reduced set data currently represented by the node has a certain size, and the node (including its sparsity) can be optimized in an inexact manner, resulting in an estimated sparsity of the data currently represented by the node/leaf); and 
updating weight parameter values based on the estimated sparsity (e.g. paragraph 0019, processing nodes of initial decision tree by updating the node’s parameters at each iteration so that the objective function decreases monotonically; paragraph 0058, indicating that node parameters in a tree are analogous to weights in a neural net; ).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Ding, and Carreira-Perpinan in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Carreira-Perpinan (directed to optimization of learning decision trees) to utilize in the loss/objective function, one or more measurements of sparsity of data of the leaves/nodes of the decision tree (as taught by Carreira-Perpinan).  One of ordinary skill would have been motivated to perform such a modification in order to learn better decision trees than with traditional algorithms as described in Carreira-Perpinan (abstract).
With respect to claims 5, 12, and 19, Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding teaches all of the limitations of claims 2, 9, and 16 as previously discussed, and Carreira-Perpinan further teaches wherein the method further comprises normalizing maximum and minimum values of the sparsity for the leaf (e.g. paragraph 0285 indicates that there is a sparsity hyperparameter; paragraphs 0297-0300 teach normalizing the tree, including by applying transformations to parameters of the tree; paragraph 0385, encouraging parameters of node to contain few nonzero values; paragraph 0400, using sparsity penalty such that the sparsity is uniformly distributed across nodes i.e. where this uniform distribution of sparsity would define maximum and minimum sparsity values for all node/leaf (i.e. such as by indicating that the sparsity value for nodes/leaves must be the same, where this same value is both the maximum and minimum value, thereby normalizing the values of the nodes/leaves to be the uniform sparsity)).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Ding, and Carreira-Perpinan in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Carreira-Perpinan (directed to optimization of learning decision trees) to normalize the measurement of sparsity of data of the leaves/nodes of the decision tree such as by applying transformations to tree parameters (such as a sparsity hyperparameter), or by requiring sparsity to be uniformly distributed across the nodes/leaves, thereby defining a sparsity value to the uniformly applied to all nodes/leaves (as taught by Carreira-Perpinan).  One of ordinary skill would have been motivated to perform such a modification in order to learn better decision trees than with traditional algorithms as described in Carreira-Perpinan (abstract).
Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding, further in view of Prakshit Gopalan, Vatsal Sharan, Udi Wieder.  PIDForest:  Anomaly Detection via Partial Identification.  December 8, 2019. Arxiv.org.  arXiv:1912.03582v1. (hereinafter Gopalan).
With respect to claims 3, 10, and 17, Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding teaches all of the limitations of claims 2, 9, and 16 as previously discussed.  Rudolph, Chang, Carreira-Perpinan, and Ding do not explicitly disclose wherein estimating the sparsity of data represented by the leaves of the one or more trees comprises: 
sampling a plurality of inputs uniformly from an input space of possible inputs; 
counting the sampled inputs represented by a leaf; and 
adjusting the count according to a predetermined constant.
However, Gopalan teaches wherein estimating the sparsity of data represented by the leaves of the one or more trees comprises: 
sampling a plurality of inputs uniformly from an input space of possible inputs (e.g. page 5, section 3, PIDForest algorithm is a heuristic designed to approximate PIDScore; tree built using sample of data; for each tree, pick random sample of m points, and use that subset to build the tree; each node in the tree corresponds to a subcube and set of points partitioned into these subcubes;  picking randomly a point and measuring sparsity of its subcube); 
counting the sampled inputs represented by a leaf (e.g. page 2, fifth paragraph (following first numbered list), defining sparsity of dataset in a subcube as being calculated based on the number of points that it contains; page 5, section 3, measuring sparsity of subcube/corresponding node; i.e. for a given subcube/node/leaf, the number of points that it contains is determined/counted); and 
adjusting the count according to a predetermined constant (e.g. page 2, fifth paragraph (following first numbered list), defining sparsity of dataset in a subcube as being calculated by further dividing the volume of the subcube by the number of points that it contains; i.e. transforming the count into a sparsity measure by utilizing it along with the volume of the subcube, where this volume appears to be a constant/known value; page 4, section 2.2, multiplying coordinate by a constant; page 5 section 3, finding average sparsity of the subcubes by weighting with the number of points).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Ding, Carreira-Perpinan, and Gopalan in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), Carreira-Perpinan (directed to optimization of learning decision trees), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Gopalan (directed to detecting anomalies in large datasets using tree/forest based algorithms) to estimate the sparsity of data in a given node/leaf/subcube by utilizing a sample of data from the overall dataset, counting/determining the number of points in the dataset that are represented/present within the node/leaf/subcube, and adjusting/utilizing this determined/counted number of points by using a predetermined constant value such as the volume of the node/leaf/subcube, utilizing a weighting based on the number of points, and/or utilizing some other constant (as taught by Gopalan).  One of ordinary skill would have been motivated to perform such a modification in order to achieve favorable performance in comparison to other anomaly detection methods and obtain succinct explanations for why points are labelled as anomalous as described in Gopalan (abstract).
With respect to claims 4, 11, and 18, Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding teaches all of the limitations of claims 2, 9, and 16 as previously discussed.  Rudolph, Chang, Carreira-Perpinan, and Ding do not explicitly disclose wherein the sparsity of data at a leaf is based at least partially on the ratio between the volume of the leaf and the percentage of data represented by the leaf.
However, Gopalan teaches wherein the sparsity of data at a leaf is based at least partially on the ratio between the volume of the leaf and the percentage of data represented by the leaf (e.g. page 2, fifth paragraph (following first numbered list), defining sparsity of dataset in a subcube as being calculated by dividing the volume of the subcube by the number of points that it contains; page 5, section 3, measuring sparsity of subcube/corresponding node; each node in the tree corresponds to a subcube).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Ding, Carreira-Perpinan, and Gopalan in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), Carreira-Perpinan (directed to optimization of learning decision trees), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Gopalan (directed to detecting anomalies in large datasets using tree/forest based algorithms) to calculate the sparsity of the data at the leaf/node/subcube using a ratio between its volume and the percentage/amount of data represented by it (as taught by Gopalan).  One of ordinary skill would have been motivated to perform such a modification in order to achieve favorable performance in comparison to other anomaly detection methods and obtain succinct explanations for why points are labelled as anomalous as described in Gopalan (abstract).
Claims 3, 4, 10, 11, 17, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding, further in view of Hehn, T.M., Kooij, J.F.P. & Hamprecht, F.A. End-to-End Learning of Decision Trees and Forests. Int J Comput Vis 128, 997–1011 (2020). https://doi.org/10.1007/s11263-019-012. (hereinafter Hehn).
With respect to claims 6, 13, and 20, Rudolph in view of Chang, further in view of Carreira-Perpinan, further in view of Ding teaches all of the limitations of claims 1, 8, and 15 as previously discussed.  
Carreira-Perpinan further teaches wherein a neural decision tree of the one or more neural decision trees comprises a function for splitting the neural decision tree having a range between zero and one (e.g. paragraph 0341, splits at nodes may be binary; paragraph 0455, achieving multiway split by series of binary splits).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Ding, and Carreira-Perpinan in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Carreira-Perpinan (directed to optimization of learning decision trees) to utilize a binary splitting function for the tree (as taught by Carreira-Perpinan).  One of ordinary skill would have been motivated to perform such a modification in order to learn better decision trees than with traditional algorithms as described in Carreira-Perpinan (abstract).
Rudolph, Chang, Carreira-Perpinan, and Ding do not explicitly disclose wherein training the GAM using the unlabeled data, comprises performing temperature annealing on the function.
However, Hehn teaches wherein a neural decision tree of the one or more neural decision trees comprises a function for splitting the neural decision tree having a range between zero and one, and wherein training the GAM using the unlabeled data, comprises performing temperature annealing on the function (e.g. page 999, second column, section 2.1, in binary decision trees, split functions s :  R -> [0, 1] determine the routing of samples through the decision tree, and control whether the splits are probabilistic or deterministic; page 1000 second column, final paragraph, introducing hyperparameter to steer steepness of split function by scaling split feature; page 1001 first column, first paragraph, during training iteratively increasing the hyperparameter, akin to temperature cooling schedule in deterministic annealing).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention having the teachings of Rudolph, Chang, Ding, Carreira-Perpinan, and Hehn in front of him to have modified the teachings of Rudolph (directed to latent outlier exposure for anomaly detection), Chang (directed to assessing the interpretability and trustworthiness of GAMs, including tree-based GAMs, in common tasks such as anomaly detection), Carreira-Perpinan (directed to optimization of learning decision trees), and Ding (directed to cascading decision trees for explainable reinforcement learning), to incorporate the teachings of Hehn (directed to end-to-end learning of decision trees and forests) to utilize a binary splitting function in the decision tree, and to perform the training using unlabeled data (as taught by Rudolph) of the GAN (as taught by Chang, including trees within the GAN), by performing temperature-based simulated annealing on the splitting function (as taught by Hehn, such as using a hyperparameter of the split function).  One of ordinary skill would have been motivated to perform such a modification in order to achieve on par or superior results for oblique decision trees and forests, with the ability to learn more complex split functions and facilitate interpretability as described in Hehn (abstract).
	
It is noted that any citation to specific pages, columns, lines, or figures in the prior art references and any interpretation of the references should not be considered to be limiting in any way. “The use of patents as references is not limited to what the patentees describe as their own inventions or to the problems with which they are concerned. They are part of the literature of the art, relevant for all they contain,” In re Heck, 699 F.2d 1331, 1332-33, 216 USPQ 1038, 1039 (Fed. Cir. 1983) (quoting in re Lemelson, 397 F.2d 1006, 1009, 158 USPQ 275, 277 (GCPA 1968)). Further, a reference may be relied upon for all that it would have reasonably suggested to one having ordinary skill the art, including nonpreferred embodiments. Merck & Co, v. Biocraft Laboratories, 874 F.2d 804, 10 USPQ2d 1843 (Fed. Cir.), cert, denied, 493 U.S. 975 (1989). See also Upsher-Smith Labs. v. Pamlab, LLC, 412 F,3d 1319, 1323, 75 USPQ2d 1213, 1215 (Fed. Cir, 2005): Celeritas Technologies Ltd. v. Rockwell International Corp., 150 F.3d 1354, 1361, 47 USPQ2d 1516, 1522-23 (Fed. Cir. 1998).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEREMY L STANLEY whose telephone number is (469)295-9105. The examiner can normally be reached on Monday-Friday from 9:00 AM to 5:00 PM CST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar, can be reached at telephone number (571) 270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center and Private PAIR for authorized users only. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) Form at https://www.uspto.gov/patents/uspto-automated- interview-request-air-form.
/JEREMY L STANLEY/
Primary Examiner, Art Unit 2127
Read full office action
Prosecution Timeline

Feb 23, 2023
Application Filed
Jan 27, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/981,018
Patent 12632733
METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO TRAIN A NEURAL NETWORK
5y 8m to grant Granted May 19, 2026
17/029,579
Patent 12632716
SYSTEM FOR EXECUTING NEURAL NETWORK
5y 7m to grant Granted May 19, 2026
17/680,108
Patent 12626105
STOCHASTIC NOISE LAYERS
4y 2m to grant Granted May 12, 2026
17/946,363
Patent 12626107
SYSTEMS AND METHODS FOR TIME SERIES FORECASTING
3y 7m to grant Granted May 12, 2026
18/584,802
Patent 12625604
AMPLIFIED COLLECTIVE INTELLIGENCE IN LARGE POPULATIONS USING DEADBANDS AND NETWORKED SUB-GROUPS
2y 2m to grant Granted May 12, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
48%
Grant Probability
92%
With Interview (+44.2%)
3y 2m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 278 resolved cases by this examiner. Grant probability derived from career allowance rate.