Last updated: May 29, 2026

Application No. 18/003,948

METHOD AND SYSTEM FOR GENERATING AN AI MODEL USING CONSTRAINED DECISION TREE ENSEMBLES

Non-Final OA §103

Filed

Dec 30, 2022

Priority

Jun 30, 2020 — AU 2020902198 +1 more

Examiner

MULLINAX, CLINT LEE

Art Unit

2123

Tech Center

2100 — Computer Architecture & Software

Assignee

Australia and New Zealand Banking Group Limited

OA Round

1 (Non-Final)

This examiner grants 48% of cases after interview

— +38.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 126 resolved cases, 2023–2026

Examiner Intelligence

MULLINAX, CLINT LEE View full profile →

Grants 48% of resolved cases

Career Allowance Rate

60 granted / 126 resolved

-7.4% vs TC avg

Strong +39% interview lift

Without

With

+38.7%

Interview Lift

resolved cases with interview

Typical timeline

4y 7m

Avg Prosecution

12 currently pending

Career history

151

Total Applications

across all art units

Statute-Specific Performance

§101

6.3%

-33.7% vs TC avg

§103

85.8%

+45.8% vs TC avg

§102

4.8%

-35.2% vs TC avg

§112

1.9%

-38.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 126 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is a responsive to the claims filed on 12/30/2022.
Claims 3-9, 11-13, 15-16, 18-23, and 28-29 are pending.
Claims 3-9, 11-13, 15-16, 18-23, and 28-29 are rejected.
Claims 1-2, 10, 14, 17, 24-27.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 3-9, 11-13, 15-16, 18-23, and 28-29 are rejected under 35 U.S.C. 103 as being unpatentable over Steele et al (US Pub 20150379426) hereinafter Steele, in view of Hetherington et al (US Pub 20200302318) hereinafter Hetherington.
Regarding claims 3 and 28-29, Steele teaches a method for generating an artificial intelligence model by applying a decision tree ensemble learning process on a dataset; system for constraining a decision tree ensemble machine learning process to generate an artificial intelligence model for a dataset, the system comprising: a processor; memory storing program code that is accessible and executable by the processor; and wherein, when the processor executed the program code (paragraphs 0149, 0151-0159, Figs. 26 and 33 teach decision trees construction on a dataset; and paragraphs 0197-0199 teach executing the embodiments of the disclosure on a computer system including one or more processors communicatively coupled to one or more memories), the processor is caused to perform the operations comprising:
receiving a dataset comprising at least two variables (paragraphs 0149, Figs. 26 and 33 teach “input data set comprising labeled observation records (i.e., observation records for which the values or “labels” of dependent variables are known)”); 
determining at least one split criteria for each variable within the dataset (paragraphs 0149, 0151-0159, Figs. 26 and 33 teach “An in-memory chunk-level split operation 2604 may be performed to obtain a training set 2610 and a test set 2615” for splitting the data record chunks to be processed by the tree nodes); 
partitioning the dataset based on each determined split criteria (paragraphs 0149, 0151-0159, Figs. 26 and 33 teach “An in-memory chunk-level split operation 2604 may be performed to obtain a training set 2610 and a test set 2615” for splitting the data record chunks and be processed by the tree nodes); 
calculating a measure of directionality for each partition of data (Examiner note: spec page 5 states “Directionality in the context of decision trees may be defined based on a comparison between different split branches at a node, whereby the comparison is between each of the respective branches' ratio of positive events to total events, and a ranking based on the magnitude of each respective branches ratio. A subsequent directionality label is based upon the ranking of each branch and each of the branches position in relation to each other with the split value criteria.” Thus, this is an optional, but not limiting, definition of “directionality”.
Steele, paragraphs 0149, 0151-0159, 0174, Figs. 26 and 33 teach predicates (measure of directionality) within decision tree nodes for processing each training/testing “observation record” (each partition of data) to “determine the path to be taken next towards a leaf node of the tree” (alternate measure of directionality). Paragraph 0179 further teaches calculating “PUM” measurement (alternate measure of directionality) of the node performance/contribution when processing the records (each partition of data) that is used for a subsequent node pruning process.); 
performing a constrained node selection process by selecting a candidate variable and split criteria, wherein the selection is made to keep a consistent directionality for the selected variable based on existing nodes (paragraphs 0179-0183 and Fig. 36 teach “create a partial (or total) order of the nodes of a decision tree based on the PUMs of the nodes, and such an ordering may be used in a tree pruning pass (constrained node selection process) of the training phase” for meeting determined “goals” of the overall decision tree performance, including “accuracy or quality of the prediction” (candidate variable). Further, a “bottom-up approach may be used…in which leaf nodes are analyzed first (split criteria…to keep), and nodes are removed (selection is made to keep a consistent directionality for the selected variable based on existing nodes) if their contribution to the quality/accuracy of the model is below a threshold until the max-nodes constraint 3610 is met” (alternate constrained node selection process by selecting a candidate variable)); 
updating a directionality table at the end of a constrained node selection (paragraphs 0179-0183 and Figs. 36-37 teach calculating a “histogram” of selected node PUM values); and 
reiterating the constrained node selection process for every node selection throughout the decision tree ensemble learning process until an ensemble model is generated (paragraphs 0150 and 0192 teach tree training and pruning process being performed “iteratively” for creating a random forest (ensemble)).

Steele at least implies performing a constrained node selection process by selecting a candidate variable and split criteria, wherein the selection is made to keep a consistent directionality for the selected variable based on existing nodes (see mappings above); however, Hetherington teaches performing a constrained node selection process by selecting a candidate variable and split criteria, wherein the selection is made to keep a consistent directionality for the selected variable based on existing nodes (paragraphs 0042 and 0080-0082 teach “Feature importance is based on quantified information gain of the condition, which depends on the feature, operator, and split value of the condition. The more effective a condition is at separating a group of examples into two subsets of greater homogeneity of classification, the greater is the information gain”; and “Step 206 generates a rule by more or less copying the condition and predominate label of a node into the rule. When multiple labels predominate, a more or less similar rule may be generated for each of the predominate labels” when traversing a specific path through the decision tree (node selection)).
Thus, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Hetherington’s teachings of decision tree rule generation from conditions including split values and information gain into Steele‘s teaching of decision tree training and testing data set partitioning and node PUM histogram determinations in order to reduce the amount of rules and improve performance and interpretability for decision tree ruleset construction (Hetherington, paragraphs 0020, 0023, 0042, and 0080-0082).

Regarding claim 4, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein the constrained node selection process comprises: generating groups of split criterions for each of one or more variables of the dataset, creating one or more variable and split criteria combinations (Steele, paragraphs 0179-0183 and Fig. 36 teach “create a partial (or total) order of the nodes of a decision tree based on the PUMs of the nodes, and such an ordering may be used in a tree pruning pass (constrained node selection process) of the training phase” for meeting determined “goals” of the overall decision tree performance, including “accuracy or quality of the prediction” (candidate variable). Further, a “bottom-up approach may be used…in which leaf nodes are analyzed first (generating groups of split criterions…combinations), and nodes are removed if their contribution to the quality/accuracy of the model is below a threshold until the max-nodes constraint 3610 is met” (creating one or more variable and split criteria combinations) when processing the training set samples (each of one or more variables of the dataset)); 
copying the dataset for every variable and split criteria combination (Hetherington, paragraphs 0042 and 0082 teach “Feature importance is based on quantified information gain of the condition, which depends on the feature, operator, and split value of the condition. The more effective a condition is at separating a group of examples into two subsets of greater homogeneity of classification, the greater is the information gain”; and “Step 206 generates a rule by more or less copying the condition and predominate label of a node into the rule. When multiple labels predominate, a more or less similar rule may be generated for each of the predominate labels”); 
partitioning each copied dataset by its associated split criteria for a variable and store resulting partitioned datasets each in a candidate table for each variable and split criteria combination (Steele, paragraphs 0147-0149, 0151-0159, Figs. 26 and 33 teach “An in-memory chunk-level split operation 2604 may be performed to obtain a training set 2610 and a test set 2615” for splitting the data record chunks and be processed by the nodes for forming histograms of the data, and transferring the datasets to memory); 
calculating a measure of homogeneity and directionality for each candidate table (Steele, paragraphs 0179-0183 teach creating histograms of node PUM values when utilizing predicates for processing data (directionality), wherein “a Gini impurity value (homogeneity) may be used as the PUM or as part of the PUM, or an entropy-based measure (homogeneity) of information gain, or some other measure of information gain may be used”); 
storing all candidate tables which pass directionality criterion in a table set (Steele, paragraph 0181 teaches “identifiers of at least some of the nodes belonging to one or more of the buckets of the histogram 3510 may be stored in persistent storage to assist in the pruning phase” having high-value nodes); 
selecting one of the candidate tables of the table set which has the optimal measure of homogeneity (Steele, paragraphs 0181 and Fig. 35 teach the PUM histogram having “High-value” node bucket based on the node’s PUM value); 
storing the associated variable and split criteria combination of the selected candidate table as a chosen candidate for the node (Steele, paragraphs 0179-0183 and Fig. 36 teach a “bottom-up approach may be used…in which leaf nodes are analyzed first (generating groups of split criterions…combinations), and nodes are removed if their contribution to the quality/accuracy of the model is below a threshold until the max-nodes constraint 3610 is met” (creating one or more variable and split criteria combinations) or maintained if above the threshold and place in memory, when processing the training set samples); and 
storing the partitioned data from selected table to use as new datasets for selection of decision nodes or leaf nodes, which branch from the selected node (Steele, paragraphs 0181-0183 teach “identifiers of at least some of the nodes belonging to one or more of the buckets of the histogram 3510 may be stored in persistent storage to assist in the pruning phase” utilized in a greedy pruning technique including “selecting the path that leads to the node with the highest PUM value at each split in the tree”).
Steele and Hetherington are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 5, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein updating a directionality table comprises entering directionality information of the selected candidate variable and split value into the directionality table (Steele, paragraphs 0179-0183 and Figs. 36-37 teach calculating a “histogram” of selected node PUM values, including “identifiers of nodes within two levels from a leaf node may be stored for one or more low-value buckets in one implementation, and such a list may be used to identify pruning candidate nodes” within the determined path).

Regarding claim 6, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein the directionality table is also updated with cumulative weighted information gain calculation for the associated variable (Steele, paragraphs 0179-0183 and Figs. 36-37 teach calculating a “histogram” of values, wherein “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”).

Regarding claim 7, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein cumulative weighted information gain for the associated variable is calculated at the end of the learning process (Steele, paragraphs 0179-0183 and Figs. 36-37 teach calculating “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”, for pruning after “a training phase”).

Regarding claim 8, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein the directionality table is not updated with directionality information for the selected candidate variable when the directionality table already contains directionality information for the selected candidate variable (Hetherington, paragraphs 0042 and 0080-0082 teach the stored decision tree node ruleset may be consolidated when “a more or less similar rule may be generated for each of the predominate labels. Building a rule from a subtree is more or less akin to flattening some of the tree, such that the subtree may be treated as a single consolidated tree node, which may entail accumulating predominate labels and/or concatenating conditions of nodes of the subtree.” Here, the rules are flattened when repeated, thus detected as already existing.).

Regarding claim 9, the combination of Steele and Hetherington teach all the claim limitations of claim 4 above; and further teach wherein candidate tables pass the directionality criterion if they match directionality with entries in the directionality table or if they have no entries in the directionality table (Steele, paragraph 0181 teaches “identifiers of at least some of the nodes belonging to one or more of the buckets of the histogram 3510 may be stored in persistent storage to assist in the pruning phase” having high-value nodes).

Regarding claim 11, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein the method is applied to random forest or a gradient boosted trees learning methods (Steele, paragraphs 0175, 0188, 0446 teach the trees in a Random Forest).

Regarding claim 12, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein the dataset comprises at least one of a continuous variable and a categorical variable (Steele, paragraphs 0105 and 0173 teach “input data may comprise data records that include variables of any of a variety of data types, such as, for example text, a numeric data type (e.g., real or integer), Boolean, a binary data type, a categorical data type”).

Regarding claim 13, the combination of Steele and Hetherington teach all the claim limitations of claim 4 above; and further teach wherein one or more split values are assigned to a candidate table for a continuous variable (Steele, paragraphs 0179-0183 and Figs. 36-37 teach calculating a “histogram” of selected node PUM values, including “identifiers of nodes within two levels from a leaf node may be stored for one or more low-value buckets in one implementation, and such a list may be used to identify pruning candidate nodes” within the determined path).

Regarding claim 15, the combination of Steele and Hetherington teach all the claim limitations of claim 4 above; and further teach wherein two or more categories are assigned to a candidate table for a categorical variable instead of a one or more split values (Steele, paragraph 0088 teach “Numeric variables may also be binned (categorized into a set of ranges such as quartiles or quintiles); such bins 767 may be used for the construction of histograms that may be displayed to the client”).

Regarding claim 16, the combination of Steele and Hetherington teach all the claim limitations of claim 4 above; and further teach wherein the measure of homogeneity is at least one of entropy and Gini (Steele, paragraph 0179 teaches “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”).

Regarding claim 18, the combination of Steele and Hetherington teach all the claim limitations of claim 4 above; and further teach further comprising presenting the user with weighted information gain and directionality information for each variable used in the ensemble at the end of the learning process (Steele, paragraphs 0088 and 0179 teach “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used” for creating “histograms that may be displayed to the client”).

Regarding claim 19, the combination of Steele and Hetherington teach all the claim limitations of claim 18 above; and further teach wherein the weighted information gain and directionality information for each variable is sorted based on weighted information gain (paragraphs 0179-0181 and Fig. 35 teach the PUM histogram having node buckets based on the node’s PUM value, wherein “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”).

Regarding claim 20, the combination of Steele and Hetherington teach all the claim limitations of claim 3 above; and further teach wherein the weighted information gain is calculated per leaf node, whereby each decision node in which the leaf node is dependent upon is factored into the weighted information gain calculation (paragraphs 0179-0181 and Fig. 35 teach the PUM values calculated for each tree node, wherein “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”).

Regarding claim 21, the combination of Steele and Hetherington teach all the claim limitations of claim 20 above; and further teach wherein the weighted information gain and directionality information per variable per leaf node is available to be presented or is presented to the user (Steele, paragraphs 0088 and 0179 teach PUM values calculated for each tree node, wherein “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used” for creating “histograms that may be displayed to the client”).

Regarding claim 22, the combination of Steele and Hetherington teach all the claim limitations of claim 4 above; and further teach wherein if two or more candidate decision nodes selected at a processing stage, whereby each use the same variable and have conflicting directionality, and no directionality is yet determined (Steele, paragraphs 0179-0183 and Fig. 35 teach the PUM histogram having “High-value” node bucket based on the node’s calculated PUM value and determining further node PUM values (no directionality is yet determined); wherein “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”), the selected node or nodes of a directionality which best meet a conflict criteria are kept, and the other selected node or nodes of another directionality are rejected (Steele, paragraphs 0179-0183 and Fig. 35 teach the PUM histogram having “High-value” node (best meet a conflict criteria are kept) bucket based on the node’s calculated PUM value; wherein “low value nodes may be deemed better candidates for removal (rejected) from the tree during pruning than the high value nodes” (kept); and further “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”).

Regarding claim 23, the combination of Steele and Hetherington teach all the claim limitations of claim 22 above; and further teach wherein the conflict criteria is at least one of: the highest information gain or weighted information gain of a node (Steele, paragraphs 0181 and Fig. 35 teach the PUM histogram having “High-value” node bucket based on the node’s PUM value; wherein “a Gini impurity value may be used as the PUM or as part of the PUM, or an entropy-based measure of information gain, or some other measure of information gain may be used”); 
the highest total information gain or total weighted information gain of nodes grouped by directionality; 
the largest number of observations of a node; 
the largest number of observations grouped by their respective node's directionality the earliest selection time of a node; or 
the largest number of candidate decision nodes grouped by directionality.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Campbell et al (US Pub 20220189638) teach decision tree generation with node splitting decision rule determinations.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123

Read full office action

Prosecution Timeline

Dec 30, 2022

Application Filed

Jan 14, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

16/005,750

Patent 12619424

ROBOTIC SCRIPT GENERATION BASED ON PROCESS VARIATION DETECTION

7y 10m to grant Granted May 05, 2026

18/380,620

Patent 12613706

HARDWARE ACCELERATED MACHINE LEARNING

2y 6m to grant Granted Apr 28, 2026

17/089,974

Patent 12608639

SYSTEM AND METHOD FOR PREDICTIVE VOLUMETRIC AND STRUCTURAL EVALUATION OF STORAGE TANKS

5y 5m to grant Granted Apr 21, 2026

18/375,973

Patent 12561620

Machine Learning-Based URL Categorization System With Noise Elimination

2y 4m to grant Granted Feb 24, 2026

16/726,709

Patent 12554962

CONFIGURABLE PROCESSOR ELEMENT ARRAYS FOR IMPLEMENTING CONVOLUTIONAL NEURAL NETWORKS

6y 1m to grant Granted Feb 17, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

48%

Grant Probability

86%

With Interview (+38.7%)

4y 7m (~1y 2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 126 resolved cases by this examiner. Grant probability derived from career allowance rate.