Last updated: April 19, 2026
Application No. 17/813,403
Machine Learning of Probability Distributions Through a Generalization Error

Non-Final OA §101§103§112
Filed
Jul 19, 2022
Examiner
KIM, SEHWAN
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Type20 AI LLC
OA Round
1 (Non-Final)
This examiner grants 60% of cases after interview

— +65.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

KIM, SEHWAN View full profile →
Grants 60% of resolved cases
Career Allow Rate
86 granted / 144 resolved
+4.7% vs TC avg
Strong +66% interview lift
Without
With
+65.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
35 currently pending
Career history
179
Total Applications
across all art units
Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
6.3%
-33.7% vs TC avg
§112
23.3%
-16.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Note
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 112, § 101 and § 103, for moving forward allowance.
Providing supporting paragraph(s) for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant's claim for the present application filed on 07/19/2022.

Claim Objections
Claim(s) 2-11, 13-20 is/are objected to because of the following informalities.
Claim(s) 2 is/are objected to because of the following informalities: it appears that “A method as claimed in Claim 1” (line 1) needs to read “The method as claimed in Claim 1” or something else. Appropriate correction is required. In addition, claim(s) 3-11 is/are objected to for the same reason.
Claim(s) 3 is/are objected to because of the following informalities: 
it appears that “that is influenced” (line 2) needs to read “that are influenced” or something else. Appropriate correction is required. In addition, claim(s) 14 is/are objected to for the same reason.
it appears that “the number of decision layers is influenced” (line 5) needs to read “the number of decision layers are influenced” or something else. Appropriate correction is required. In addition, claim(s) 14 is/are objected to for the same reason.
Claim(s) 6 is/are objected to because of the following informalities: it appears that “the decision nodes” (line 4) needs to read “the plurality of decision nodes” or something else. Appropriate correction is required. In addition, claim(s) 17 is/are objected to for the same reason.
Claim(s) 13 is/are objected to because of the following informalities: it appears that “A system as claimed in Claim 12” (line 1) needs to read “The system as claimed in Claim 12” or something else. Appropriate correction is required. In addition, claim(s) 14-20 is/are objected to for the same reason.
Claim(s) 2-11, 13-20 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are objected to at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim(s) 2-11, 13-20 is/are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim(s) 2 recite(s) the limitation “the proposed base ensemble of binary decision trees” (in step iv). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a proposed base ensemble of binary decision trees”, or something else. For the purposes of examination, “a proposed base ensemble of binary decision trees” is used. In addition, claim(s) 13 is/are rejected for the same reason.
Claim(s) 2 recite(s) the limitation “the proposed changed ensemble of binary decision trees” (in steps v and vi). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to, since it is not clear if it indicates “propose a changed ensemble of randomized binary decision trees” or something else. It appears it may need to read “a proposed changed ensemble of binary decision trees”, or something else. For the purposes of examination, “a proposed changed ensemble of binary decision trees” is used. In addition, claim(s) 13 is/are rejected for the same reason.
Claim(s) 2 recite(s) the limitation “the new base ensemble of binary decision trees” (in step vi). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a new base ensemble of binary decision trees”, or something else. For the purposes of examination, “a new base ensemble of binary decision trees” is used. In addition, claim(s) 13 is/are rejected for the same reason.
Claim(s) 3 recite(s) the limitation “the randomized binary decision trees” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to since it may indicate “randomized binary decision trees” (clime 2, step ii) or “randomized binary decision trees” (claim 3, line 1), or something else. It appears it may need to read “randomized binary decision trees”, or something else. For the purposes of examination, “randomized binary decision trees” is used. In addition, claim(s) 14 is/are rejected for the same reason.
Claim(s) 3 recite(s) the limitation “the method” (line 4). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to since it may indicate “method” (clime 1, line 1) or “method” (claim 2, line 1), or “method” (claim 3, line 1), or something else. It appears “A method” (in claim 2 and claim 3) may need to read “The method”, or something else. For the purposes of examination, “The method” is used for claim 2 and claim 3. 
Claim(s) 3 recite(s) the limitation “the binary decision trees” (line 6). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to since it may indicate “binary decision trees” (clime 2, step ii) or “binary decision trees” (claim 3, line 1), or something else. It appears it may need to read “binary decision trees”, or something else. For the purposes of examination, “binary decision trees” is used. In addition, claim(s) 14 is/are rejected for the same reason.
Claim(s) 4 recite(s) the limitation “the computed new generalization error” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a computed new generalization error”, or something else. For the purposes of examination, “a computed new generalization error” is used. In addition, claim(s) 15 is/are rejected for the same reason.
Claim(s) 4 recite(s) the limitation “the designated base generalization error” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a designated base generalization error”, or something else. For the purposes of examination, “a designated base generalization error” is used. In addition, claim(s) 15 is/are rejected for the same reason.
Claim(s) 5 recite(s) the limitation “the proposed changed ensemble of binary decision trees” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to, since it is not clear if it indicates “propose a changed ensemble of randomized binary decision trees” (claim 2, step ii) or something else. It appears it may need to read “a proposed changed ensemble of binary decision trees”, or something else. For the purposes of examination, “a proposed changed ensemble of binary decision trees” is used. In addition, claim(s) 16 is/are rejected for the same reason.
Claim(s) 5 recite(s) the limitation “the elements of a selected entry” (3rd last line). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “elements of a selected entry”, or something else. For the purposes of examination, “elements of a selected entry” is used. In addition, claim(s) 16 is/are rejected for the same reason.
Claim(s) 5 recite(s) the limitation “the probability model” (2nd last line). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a probability model”, or something else. For the purposes of examination, “a probability model” is used. In addition, claim(s) 16 is/are rejected for the same reason.
Claim(s) 6 recite(s) the limitation “the given pair of branches” (last line of step (a)). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “given pair of branches”, or something else. For the purposes of examination, “given pair of branches” is used. In addition, claim(s) 17 is/are rejected for the same reason.
Claim(s) 7 recite(s) the limitation “the plurality of leaf nodes of a given randomized binary decision tree” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a plurality of leaf nodes of a given randomized binary decision tree”, or something else. For the purposes of examination, “a plurality of leaf nodes of a given randomized binary decision tree” is used. In addition, claim(s) 18 is/are rejected for the same reason.
Claim(s) 7 recite(s) the limitation “the samples from the training set” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “samples from the training set”, or something else. For the purposes of examination, “samples from the training set” is used. In addition, claim(s) 18 is/are rejected for the same reason.
Claim(s) 7 recite(s) the limitation “the plurality of leaf nodes” (line 6). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to, since it may indicates “a plurality of leaf nodes” (claim 6, line 5) or “the plurality of leaf nodes” (claim 7, line 2) or something else. It appears it may need to read “a plurality of leaf nodes”, or something else. For the purposes of examination, “a plurality of leaf nodes” is used. In addition, claim(s) 18 is/are rejected for the same reason.
Claim(s) 8 recite(s) the limitation “the new training set” (line 5). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a new training set”, or something else. For the purposes of examination, “a new training set” is used. In addition, claim(s) 19 is/are rejected for the same reason.
Claim(s) 9 recite(s) the limitation “the plurality of leaf nodes of a given randomized binary decision tree” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a plurality of leaf nodes of a given randomized binary decision tree”, or something else. For the purposes of examination, “a plurality of leaf nodes of a given randomized binary decision tree” is used. In addition, claim(s) 20 is/are rejected for the same reason.
Claim(s) 9 recite(s) the limitation “the samples from the new training set” (line 3). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “samples from the new training set”, or something else. For the purposes of examination, “samples from the new training set” is used. In addition, claim(s) 20 is/are rejected for the same reason.
Claim(s) 9 recite(s) the limitation “the holdout set” (in step (b)). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “a holdout set”, or something else. For the purposes of examination, “a holdout set” is used. In addition, claim(s) 20 is/are rejected for the same reason.
Claim(s) 9 recite(s) the limitation “the plurality of leaf nodes” (line 6). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to, since it may indicates “a plurality of leaf nodes” (claim 6, line 5) or “the plurality of leaf nodes” (claim 9, line 2) or something else. It appears it may need to read “a plurality of leaf nodes”, or something else. For the purposes of examination, “a plurality of leaf nodes” is used. In addition, claim(s) 20 is/are rejected for the same reason.
Claim(s) 9 recite(s) the limitation “the leaf nodes of each randomized binary decision tree in the proposed changed ensemble thereof” (step (e), line 1). There is insufficient antecedent basis for this limitation in the claim. It is not clear what it is referring to. It appears it may need to read “leaf nodes of each randomized binary decision tree in the proposed changed ensemble thereof”, or something else. For the purposes of examination, “leaf nodes of each randomized binary decision tree in the proposed changed ensemble thereof” is used. In addition, claim(s) 20 is/are rejected for the same reason.
Claim(s) 10 recite(s) the limitation “the samples assigned to the leaf nodes of the base ensemble of binary decision trees” (line 2). There is insufficient antecedent basis for this limitation in the claim. It is not clear what they are referring to. It appears it may need to read “samples assigned to leaf nodes of the base ensemble of binary decision trees”, or something else. For the purposes of examination, “samples assigned to leaf nodes of the base ensemble of binary decision trees” is used.
Claim(s) 2-10, 13-20 each recite(s) limitations that raise issues of indefiniteness as set forth above, and their dependent claims are rejected at least based on their direct and/or indirect dependency from the claims listed above. Appropriate explanation and/or amendment is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“… method of …, the method comprising:
…, …, …;
from empirical data of interest for the given machine learning model, estimating a probability distribution;
automatically improving the estimated probability distribution using a generalization error, said improving being …:
modeling the probability distribution …, and
optimizing choice of tree in the decision tree ensemble by minimizing the generalization error, 
…”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“A computer-implemented”, “by a processor”, “using a decision tree ensemble”) – using a device and/or a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“training a machine learning model”, “the given machine learning model being formed based on a data mapping”, “the improved estimated probability distribution determining weights or parameters for the given machine learning model resulting in a trained model”). The additional element is recited at such a high level without any details as to how a model is trained such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“accessing a given machine learning model”) – the act of accessing data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of accessing data is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of accessing data) such that it amounts no more than a mere act to apply the exception using a generic act of accessing. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
In particular, the claim recites an additional element (“the data mapping associating an input data point to a respective output data point”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. MPEP 2106.05(f).
The additional elements regarding training are recited at such a high level without any details as to how a model is trained such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not amount to significantly more than the abstract idea. The claim is directed to an abstract idea.
As discussed above, the claim recites the additional element(s) of accessing data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 2
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“… and …; and
…:
(i) determine randomly a base ensemble of binary decision trees; and
wherein automatically improving the estimated probability distribution includes …:
(ii) determine and thereby propose a changed ensemble of randomized binary decision trees;
(iii) randomly sort samples of the plurality of samples … to define a training set and a holdout set complementary to the training set; and
(iv) evaluate using the training and holdout sets from step (iii) a base generalization error of the proposed base ensemble of binary decision trees;
(v) evaluate using the training and holdout sets from step (iii) a new generalization error of the proposed changed ensemble of binary decision trees;
(vi) in response to the new generalization error being less than the base generalization error designate, …, the proposed changed ensemble of binary decision trees as the new base ensemble of binary decision trees; and
…, thereby optimizing the base ensemble of binary decision trees based on generalization error thereof, …”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element(s) (“from the empirical data, obtaining a data set including a plurality of samples”) – the act of receiving data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of receiving data is recited at a high-level of generality (i.e., as a generic act of receiving performing a generic act function of receiving data) such that it amounts no more than a mere act to apply the exception using a generic act of receiving. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
In particular, the claim recites an additional element(s) (“storing the data set within a computer memory element”) – the act of storing data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of storing data is recited at a high-level of generality (i.e., as a generic act of storing performing a generic act function of storing data) such that it amounts no more than a mere act to apply the exception using a generic act of storing. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“further configuring the processor to”, “within the computer memory element”) – using a device and/or a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“(vii) repeat steps (ii) - (vi) according to a pre-determined constant number of training iterations”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
In particular, the claim recites an additional element (“the optimized base ensemble of binary decision trees representing the automatically improved estimated probability distribution”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, the claim recites the additional element(s) of receiving data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
As discussed above, the claim recites the additional element(s) of storing data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g) – storing data. However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. MPEP 2106.05(f).
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 3
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 2.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element (“wherein respective randomized binary decision trees of the base ensemble thereof have a number of decision layers that is influenced by a predetermined maximum number of samples allowed within leaf nodes of the randomized binary decision trees”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“the method further including configuring the processor to”) – using a device and/or a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“(viii) repeat steps (ii) - (vi) wherein the number of decision layers is influenced by a reduced maximum number of samples allowed within the leaf nodes of the binary decision trees, such that the proposed changed ensemble of randomized binary decision trees has a greater number of decision layers than the number of decision layers in the base ensemble of randomized binary decision trees”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. MPEP 2106.05(f).
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.

Regarding claim 4
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 3.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element(s) (“(ix) recursively repeat step (viii) until the computed new generalization error is smaller than the designated base generalization error for a pre-determined number of iterations, thereby increasing the number of decision layers in the base ensemble of randomized binary decision trees until an optimized generalization error is reached”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.

Regarding claim 5
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“before respectively designating the proposed changed ensemble of binary decision trees as the base ensemble of binary decision trees, …; …; and 
… respectively designate the elements of a selected entry as the base ensemble of binary decision trees, thereby returning the probability model to a previously estimated state for further optimization therefrom”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“configuring the processor to”) – using a device and/or a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“store the base ensemble of binary decision trees as elements of an entry in a historical database within the computer memory element”) – the act of storing data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of storing data is recited at a high-level of generality (i.e., as a generic act of storing performing a generic act function of storing data) such that it amounts no more than a mere act to apply the exception using a generic act of storing. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element (“the historical database configured to retain a predetermined number of entries”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. MPEP 2106.05(f).
As discussed above, the claim recites the additional element(s) of storing data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g) – storing data. However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 6
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“wherein proposing a base ensemble of randomized binary decision trees includes:
(a) defining a binary decision tree having a root node, a plurality of branches and a plurality of decision nodes corresponding to the plurality of branches, …;
(b) assigning a given training sample from the training set to individual leaf nodes of the plurality of leaf nodes by passing the given training sample from the root node along selected branches determined by the evaluations of the respective inequalities for the given training sample at successive selected branches;
…”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element (“the decision nodes including a plurality of intermediate decision nodes and a plurality of leaf nodes, the branches initially radiating from the root node and mutually connected by the intermediate decision nodes, pairs of the branches corresponding to pairs of opposing evaluations of respective inequalities instructive of a comparison between any sample from the training set and a random threshold value assigned to the given pair of branches”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 
In particular, the claim recites an additional element(s) (“(c) repeating step (b) for each sample in the training set; and (d) repeating steps (b) and (c) until a pre-determined number of decision trees has been met, thereby producing a base ensemble of randomized binary decision trees”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.

Regarding claim 7
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“wherein evaluating a base generalization error includes:
…;
(b) assigning a given test sample from the holdout set to individual leaf nodes of the plurality of leaf nodes by passing the given test sample from the root node along selected branches determined by evaluations of the respective inequalities for the given test sample at successive selected branches;
…, and
…”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

In addition, the limitations of 
(a) computing respective point estimates for each leaf node of the plurality of leaf nodes of a given randomized binary decision tree based on the samples from the training set assigned to the leaf node;
…, and
(e) computing a sum, over each randomized binary decision tree in the base ensemble thereof, of squared differences between a first value and a second value, the first value being a probability of having correctly, according to output values of samples of the holdout set, assigned the samples of the holdout set to individual leaf nodes based on input values of the holdout set, the second value being unity”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element(s) (“(c) repeating steps (a) and (b) for each test sample in the holdout set; (d) repeating steps (a) through (c) for each randomized binary decision tree in the base ensemble thereof”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.

Regarding claim 8
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“wherein proposing a changed ensemble of randomized binary decision trees includes:
(a) defining a change to a randomly selected random threshold value of a given pair of branches;
(b) assigning a given training sample from the new training set to individual leaf nodes of the plurality of leaf nodes by passing the given training sample from the root node along selected branches determined by the evaluations of the respective inequalities for the given training sample at successive selected branches;
…”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element(s) (“(c) repeating step (b) for each sample in the training set, and (d) repeating steps (b) and (c) until the pre-determined number of decision trees has been met, thereby producing a changed ensemble of randomized binary decision trees”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.

Regarding claim 9
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“wherein evaluating a new generalization error includes:
…;
(b) assigning a given test sample from the new holdout set to individual leaf nodes of the plurality of leaf nodes by passing the given test sample from the root node along selected branches determined by evaluations of the respective inequalities for the given test sample at successive selected branches;
…, and
…”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

In addition, the limitations of 
“wherein evaluating a new generalization error includes:
(a) computing respective point estimates for each leaf node of the plurality of leaf nodes of a given randomized binary decision tree based on the samples from the new training set assigned to the leaf node;
…, and
(e) computing a sum, over the leaf nodes of each randomized binary decision tree in the proposed changed ensemble thereof, of squared differences between a first value and a second value, the first value being a probability of having correctly, according to output values of samples of the new holdout set, assigned the samples of the new holdout set to individual leaf nodes based on input values of the holdout set, the second value being unity”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element(s) (“(c) repeating steps (a) and (b) for each test sample in the new holdout set; (d) repeating steps (a) through (c) for each randomized binary decision tree in the proposed changed ensemble thereof”) – the act of repeating. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of repeating is recited at a high-level of generality (i.e., as a generic act of performing a generic act function of repeating) such that it amounts no more than a mere act to apply the exception using a generic act of repeating. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, the claim recites the additional element(s) of repeating at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Performing repetitive calculations”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.

Regarding claim 10
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: 
The limitations of 
“further including computing a measure of variability based on the samples assigned to the leaf nodes of the base ensemble of binary decision trees”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 

Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.

Regarding claim 11
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element (“wherein the measure of variability is at least one of variance, standard deviation, range, and interquartile range”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 10
The claim recites “A machine learning system, the system comprising: a processor and a computer memory element with computer-executable software instructions and a machine learning model stored thereon, the instructions, when loaded by the processor, causing the processor to be configured to:” to perform precisely the method of Claim 1. As performance of an abstract idea on generic computer components (see MPEP 2106.05(f)) and “Storing and retrieving information in memory” (see MPEP 2106.05(g) on Insignificant Extra-Solution Activity, and MPEP 2106.05(d) on Well-Understood, Routine, Conventional Activity) cannot integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself, the claim is rejected for reasons set forth in the rejection of Claim 1.

Regarding claim 13
The claim is rejected for the reasons set forth in the rejection of Claim 2 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 14
The claim is rejected for the reasons set forth in the rejection of Claim 3 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 15
The claim is rejected for the reasons set forth in the rejection of Claim 4 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 16
The claim is rejected for the reasons set forth in the rejection of Claim 5 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 17
The claim is rejected for the reasons set forth in the rejection of Claim 6 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 18
The claim is rejected for the reasons set forth in the rejection of Claim 7 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 19
The claim is rejected for the reasons set forth in the rejection of Claim 8 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 20
The claim is rejected for the reasons set forth in the rejection of Claim 9 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over CHIPMAN et al. (BART: BAYESIAN ADDITIVE REGRESSION TREES) in view of Hernández et al. (Bayesian Additive Regression Trees using Bayesian model averaging)

Regarding claim 1
(Note: Hereinafter, if a limitation has bold brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)

CHIPMAN teaches
A [computer]-implemented method of training a machine learning model, the method comprising:
(CHIPMAN [sec(s) 1] “To facilitate the use of the BART methods described in this paper, we have provided open-source software implementing BART as a stand-alone package or with an interface to R, along with full documentation and examples. It is available as the BayesTree library in R at http://cran.r-project.org/.” [sec(s) 5] “In this section we demonstrate the application of BART on several examples. We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets. We next, in Section 5.2, evaluate and illustrate BART’s capabilities on simulated data used by Friedman (1991). Finally, in Section 5.3 we apply the BART probit model to a drug discovery classification problem. All of the BART calculations throughout this section can be reproduced with the BayesTree library at http://cran.r-project.org/.” [sec(s) Abs] “We develop a Bayesian “sum-of-trees” model where each tree is con strained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior.”;)

accessing a given machine learning model, the given machine learning model being formed based on a data mapping, the data mapping associating an input data point to a respective output data point;
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 1] “We consider the fundamental problem of making inference about an unknown function f that predicts an output Y using a p dimensional vector of inputs x = (x1,...,xp) when (1) 
    PNG
    media_image2.png
    60
    663
    media_image2.png
    Greyscale
. To do this, we consider modeling or at least approximating f(x) = E(Y|x), the mean of Y given x, by a sum of m regression trees f (x) ≈ h(x) ≡ ∑mj=1 gj(x) where each gj denotes a regression tree. Thus, we approximate (1) by a sum-of-trees model (2) 
    PNG
    media_image3.png
    83
    954
    media_image3.png
    Greyscale
.”;)

from empirical data of interest for the given machine learning model, estimating a probability distribution;
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 1] “We consider the fundamental problem of making inference about an unknown function f that predicts an output Y using a p dimensional vector of inputs x = (x1,...,xp) when (1) 
    PNG
    media_image2.png
    60
    663
    media_image2.png
    Greyscale
. To do this, we consider modeling or at least approximating f(x) = E(Y|x), the mean of Y given x, by a sum of m regression trees f (x) ≈ h(x) ≡ ∑mj=1 gj(x) where each gj denotes a regression tree. Thus, we approximate (1) by a sum-of-trees model (2) 
    PNG
    media_image3.png
    83
    954
    media_image3.png
    Greyscale
.” [sec(s) 2.2.3] “The essence of our strategy is then to choose μμ and σμ so that N(mμμ,mσ2μ) assigns substantial probability to the interval (ymin,ymax). This can be conveniently done by choosing μμ and σμ so that mμμ − k                
                    
                        m
                    
                
            σμ = ymin and mμμ + k                
                    
                        m
                    
                
            σμ = ymax for some preselected value of k. For example, k = 2 would yield a 95% prior probability that E(Y|x) is in the interval (ymin,ymax). The strategy above uses an aspect of the observed data, namely, ymin and ymax, to try to ensure that the implicit prior for E(Y|x) is in the right “ballpark.” That is to say, we want it to assign substantial probability to the entire region of plausible values of E(Y|x) while avoiding overconcentration and overdispersion.”;)

automatically improving the estimated probability distribution using a generalization error, said improving being by [a processor]:
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 1] “We consider the fundamental problem of making inference about an unknown function f that predicts an output Y using a p dimensional vector of inputs x = (x1,...,xp) when (1) 
    PNG
    media_image2.png
    60
    663
    media_image2.png
    Greyscale
. To do this, we consider modeling or at least approximating f(x) = E(Y|x), the mean of Y given x, by a sum of m regression trees f (x) ≈ h(x) ≡ ∑mj=1 gj(x) where each gj denotes a regression tree. Thus, we approximate (1) by a sum-of-trees model (2) 
    PNG
    media_image3.png
    83
    954
    media_image3.png
    Greyscale
.” [sec(s) 2.2.3] “The essence of our strategy is then to choose μμ and σμ so that N(mμμ,mσ2μ) assigns substantial probability to the interval (ymin,ymax). This can be conveniently done by choosing μμ and σμ so that mμμ − k                
                    
                        m
                    
                
            σμ = ymin and mμμ + k                
                    
                        m
                    
                
            σμ = ymax for some preselected value of k. For example, k = 2 would yield a 95% prior probability that E(Y|x) is in the interval (ymin,ymax). The strategy above uses an aspect of the observed data, namely, ymin and ymax, to try to ensure that the implicit prior for E(Y|x) is in the right “ballpark.” That is to say, we want it to assign substantial probability to the entire region of plausible values of E(Y|x) while avoiding overconcentration and overdispersion.” [sec(s) Abs] “We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior.” [sec(s) 3.1] “At each iteration, each tree may increase or decrease the number of terminal nodes by one, or change one or two decision rules. Each μ will change (or cease to exist or be born), and σ will change.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets. … All of the BART calculations throughout this section can be reproduced with the BayesTree library at http://cran.r-project.org/.” [sec(s) 5.2.3] “For each value of p, we simulated 100 data sets of n = 100 observations each. As in Section 5.1, we used 5-fold cross-validation to choose tuning parameters. Because f is known here, there was no need to simulate test set data.”; e.g., “cross-validation” read(s) on “generalization error”.)

modeling the probability distribution using a decision tree ensemble, and
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 1] “We consider the fundamental problem of making inference about an unknown function f that predicts an output Y using a p dimensional vector of inputs x = (x1,...,xp) when (1) 
    PNG
    media_image2.png
    60
    663
    media_image2.png
    Greyscale
. To do this, we consider modeling or at least approximating f(x) = E(Y|x), the mean of Y given x, by a sum of m regression trees f (x) ≈ h(x) ≡ ∑mj=1 gj(x) where each gj denotes a regression tree. Thus, we approximate (1) by a sum-of-trees model (2) 
    PNG
    media_image3.png
    83
    954
    media_image3.png
    Greyscale
.” [sec(s) 2.2.3] “The essence of our strategy is then to choose μμ and σμ so that N(mμμ,mσ2μ) assigns substantial probability to the interval (ymin,ymax). This can be conveniently done by choosing μμ and σμ so that mμμ − k                
                    
                        m
                    
                
            σμ = ymin and mμμ + k                
                    
                        m
                    
                
            σμ = ymax for some preselected value of k. For example, k = 2 would yield a 95% prior probability that E(Y|x) is in the interval (ymin,ymax). The strategy above uses an aspect of the observed data, namely, ymin and ymax, to try to ensure that the implicit prior for E(Y|x) is in the right “ballpark.” That is to say, we want it to assign substantial probability to the entire region of plausible values of E(Y|x) while avoiding overconcentration and overdispersion.”;)

optimizing choice of tree in the decision tree ensemble by minimizing the generalization error, 
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) Abs] “We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior.” [sec(s) 3.1] “At each iteration, each tree may increase or decrease the number of terminal nodes by one, or change one or two decision rules. Each μ will change (or cease to exist or be born), and σ will change.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.2.3] “For each value of p, we simulated 100 data sets of n = 100 observations each. As in Section 5.1, we used 5-fold cross-validation to choose tuning parameters. Because f is known here, there was no need to simulate test set data.”; e.g., “cross-validation” read(s) on “generalization error”.)

the improved estimated probability distribution determining weights or parameters for the given machine learning model resulting in a trained model.
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) Abs] “We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior.” [sec(s) 2.2.3] “The essence of our strategy is then to choose μμ and σμ so that N(mμμ,mσ2μ) assigns substantial probability to the interval (ymin,ymax). This can be conveniently done by choosing μμ and σμ so that mμμ − k                
                    
                        m
                    
                
            σμ = ymin and mμμ + k                
                    
                        m
                    
                
            σμ = ymax for some preselected value of k. For example, k = 2 would yield a 95% prior probability that E(Y|x) is in the interval (ymin,ymax). The strategy above uses an aspect of the observed data, namely, ymin and ymax, to try to ensure that the implicit prior for E(Y|x) is in the right “ballpark.” That is to say, we want it to assign substantial probability to the entire region of plausible values of E(Y|x) while avoiding overconcentration and overdispersion.” [sec(s) 3.1] “At each iteration, each tree may increase or decrease the number of terminal nodes by one, or change one or two decision rules. Each μ will change (or cease to exist or be born), and σ will change.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.2.3] “For each value of p, we simulated 100 data sets of n = 100 observations each. As in Section 5.1, we used 5-fold cross-validation to choose tuning parameters. Because f is known here, there was no need to simulate test set data.”;)

However, CHIPMAN does not appear to explicitly teach:
A [computer]-implemented method of training a machine learning model, the method comprising:
automatically improving the estimated probability distribution using a generalization error, said improving being by [a processor]:

(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)

Hernández teaches
A computer-implemented method of training a machine learning model, the method comprising:
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.”;)

automatically improving the estimated probability distribution using a generalization error, said improving being by a processor:
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.” [sec(s) 4.1] “Each method was compared across the seven datasets using fivefold cross-validation and the cross-validated root-mean-squared error (RMSE) was recorded.”;)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of CHIPMAN with the computer of Hernández.
One of ordinary skill in the art would have been motived to combine in order to be computationally feasible for high-dimensional datasets and improve model selection speed by using a greedy search algorithm to find predictive split points so that only high-quality splits are proposed when growing tree models.
(Hernández [sec(s) 1] “In order to improve model selection speed, BART-BMA uses a greedy search algorithm to find predictive split points, so only high-quality splits are proposed when growing tree models. Thus, BART-BMA is computationally feasible for high-dimensional datasets, does not require specialised hardware or software and brings with it the advantages of a model-based approach.”)

Regarding claim 2
The combination of CHIPMAN, Hernández teaches claim 1.

wherein estimating the probability distribution includes: (See claim 1)

CHIPMAN further teaches
from the empirical data, obtaining a data set including a plurality of samples and storing the data set within a [computer memory] element; and
(CHIPMAN [table(s) 1] [sec(s) 5.1] “Our first illustration is a “bakeoff,” a predictive performance comparison of BART with competing methods on 42 different real data sets. These data sets (see Table 1) are a subset of 52 sets considered by Kim et al. (2007). Ten data sets were excluded either because Random Forests was unable to use over 32 categorical predictors, or because a single train/test split was used in the original paper. All data sets correspond to regression setups with between 3 and 28 numeric predictors and 0 to 6 categorical predictors. Categorical predictors were converted into 0/1 indicator variables corresponding to each level. Sample sizes vary from 96 to 6806 observations. In each of the 42 data sets, the response was minimally preprocessed, applying a log or square root transformation if this made the histogram of observed responses more bell-shaped. In about half the cases, a log transform was used to reduce a right tail. In one case (Fishery) a square root transform was most appropriate. For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE”;)

further configuring the [processor] to:
(i) determine randomly a base ensemble of binary decision trees; and 
(CHIPMAN [sec(s) 1] “In this paper we propose a Bayesian approach called BART (Bayesian Additive Regression Trees) which uses a sum of trees to model or approximate f (x) = E(Y|x). The essential idea is to elaborate the sum-of-trees model (2) by imposing a prior that regularizes the fit by keeping the individual tree effects small. In effect, the gj’s become a dimensionally adaptive random basis of “weak learners,” to borrow a phrase from the boosting literature. By weakening the gj effects, BART ends up with a sum of trees, each of which explains a small and different portion of f. Note that BART is not equivalent to posterior averaging of single tree fits of the entire function f.” [sec(s) 2.1] “To elaborate the form of the sum-of-trees model (2), we begin by establishing notation for a single tree model. Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T .” [sec(s) 3.1] “Hastie and Tibshirani (2000) considered a similar application of the Gibbs sampler for posterior sampling for additive and generalized additive models with σ fixed, and showed how it was a stochastic generalization of the backfitting algorithm for such models. For this reason, we refer to our algorithm as backfitting MCMC.”;)

wherein automatically improving the estimated probability distribution includes (See claim 1) further configuring the [processor] to:
(ii) determine and thereby propose a changed ensemble of randomized binary decision trees;
(CHIPMAN [sec(s) 1] “In this paper we propose a Bayesian approach called BART (Bayesian Additive Regression Trees) which uses a sum of trees to model or approximate f (x) = E(Y|x). The essential idea is to elaborate the sum-of-trees model (2) by imposing a prior that regularizes the fit by keeping the individual tree effects small. In effect, the gj’s become a dimensionally adaptive random basis of “weak learners,” to borrow a phrase from the boosting literature. By weakening the gj effects, BART ends up with a sum of trees, each of which explains a small and different portion of f. Note that BART is not equivalent to posterior averaging of single tree fits of the entire function f.” [sec(s) 2.1] “To elaborate the form of the sum-of-trees model (2), we begin by establishing notation for a single tree model. Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T .” [sec(s) 3.1] “Hastie and Tibshirani (2000) considered a similar application of the Gibbs sampler for posterior sampling for additive and generalized additive models with σ fixed, and showed how it was a stochastic generalization of the backfitting algorithm for such models. For this reason, we refer to our algorithm as backfitting MCMC.” [sec(s) 3.1] “The draw of Tj in (15), although somewhat elaborate, can be obtained using the Metropolis–Hastings (MH) algorithm of CGM98. This algorithm proposes a new tree based on the current tree using one of four moves. The moves and their associated proposal probabilities are as follows: growing a terminal node (0.25), pruning a pair of terminal nodes (0.25), changing a nonterminal rule (0.40), and swapping a rule between parent and child (0.10). Although the grow and prune moves change the number of terminal nodes, by integrating out Mj in (14), we avoid the complexities associated with reversible jumps between continuous spaces of varying dimensions [Green (1995)].”;)

(iii) randomly sort samples of the plurality of samples within the [computer memory] element to define a training set and a holdout set complementary to the training set; and
(CHIPMAN [sec(s) 5.1] “Our first illustration is a “bakeoff,” a predictive performance comparison of BART with competing methods on 42 different real data sets. These data sets (see Table 1) are a subset of 52 sets considered by Kim et al. (2007). Ten data sets were excluded either because Random Forests was unable to use over 32 categorical predictors, or because a single train/test split was used in the original paper. All data sets correspond to regression setups with between 3 and 28 numeric predictors and 0 to 6 categorical predictors. Categorical predictors were converted into 0/1 indicator variables corresponding to each level. Sample sizes vary from 96 to 6806 observations. In each of the 42 data sets, the response was minimally preprocessed, applying a log or square root transformation if this made the histogram of observed responses more bell-shaped. In about half the cases, a log transform was used to reduce a right tail. In one case (Fishery) a square root transform was most appropriate. For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.”;)

(iv) evaluate using the training and holdout sets from step (iii) a base generalization error of the proposed base ensemble of binary decision trees;
(CHIPMAN [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.”; e.g., “cross-validation” read(s) on “generalization error”.)

(v) evaluate using the training and holdout sets from step (iii) a new generalization error of the proposed changed ensemble of binary decision trees;
(CHIPMAN [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.”; e.g., “cross-validation” read(s) on “generalization error”.)

(vi) in response to the new generalization error being less than the base generalization error designate, within the [computer memory] element, the proposed changed ensemble of binary decision trees as the new base ensemble of binary decision trees; and
(CHIPMAN [sec(s) 1] “In this paper we propose a Bayesian approach called BART (Bayesian Additive Regression Trees) which uses a sum of trees to model or approximate f (x) = E(Y|x). … To fit the sum-of-trees model, BART uses a tailored version of Bayesian backfitting MCMC [Hastie and Tibshirani (2000)] that iteratively constructs and fits successive residuals. Although similar in spirit to the gradient boosting approach of Friedman (2001), BART differs in both how it weakens the individual trees by instead using a prior, and how it performs the iterative fitting by instead using Bayesian backfitting on a fixed number of trees. Conceptually, BART can be viewed as a Bayesian nonparametric approach that fits a parameter rich model using a strongly influential prior distribution” [sec(s) 2.1] “To elaborate the form of the sum-of-trees model (2), we begin by establishing notation for a single tree model. Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T .” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.” [sec(s) 3.1] “The draw of Tj in (15), although somewhat elaborate, can be obtained using the Metropolis–Hastings (MH) algorithm of CGM98. This algorithm proposes a new tree based on the current tree using one of four moves. The moves and their associated proposal probabilities are as follows: growing a terminal node (0.25), pruning a pair of terminal nodes (0.25), changing a nonterminal rule (0.40), and swapping a rule between parent and child (0.10). Although the grow and prune moves change the number of terminal nodes, by integrating out Mj in (14), we avoid the complexities associated with reversible jumps between continuous spaces of varying dimensions [Green (1995)].”;)

(vii) repeat steps (ii) - (vi) according to a pre-determined constant number of training iterations, thereby optimizing the base ensemble of binary decision trees based on generalization error thereof, the optimized base ensemble of binary decision trees representing the automatically improved estimated probability distribution.
(CHIPMAN [sec(s) 1] “In this paper we propose a Bayesian approach called BART (Bayesian Additive Regression Trees) which uses a sum of trees to model or approximate f (x) = E(Y|x). … To fit the sum-of-trees model, BART uses a tailored version of Bayesian backfitting MCMC [Hastie and Tibshirani (2000)] that iteratively constructs and fits successive residuals. Although similar in spirit to the gradient boosting approach of Friedman (2001), BART differs in both how it weakens the individual trees by instead using a prior, and how it performs the iterative fitting by instead using Bayesian backfitting on a fixed number of trees. Conceptually, BART can be viewed as a Bayesian nonparametric approach that fits a parameter rich model using a strongly influential prior distribution” [sec(s) 2.1] “To elaborate the form of the sum-of-trees model (2), we begin by establishing notation for a single tree model. Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T .” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.1] “Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE. … the number of burn-in steps and MCMC iterations used were determined by inspection of a single long run. Typically, 200 burn-in steps and 1000 iterations were used.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.” [sec(s) 3.1] “The draw of Tj in (15), although somewhat elaborate, can be obtained using the Metropolis–Hastings (MH) algorithm of CGM98. This algorithm proposes a new tree based on the current tree using one of four moves. The moves and their associated proposal probabilities are as follows: growing a terminal node (0.25), pruning a pair of terminal nodes (0.25), changing a nonterminal rule (0.40), and swapping a rule between parent and child (0.10). Although the grow and prune moves change the number of terminal nodes, by integrating out Mj in (14), we avoid the complexities associated with reversible jumps between continuous spaces of varying dimensions [Green (1995)].”;)

Hernández further teaches
storing the data set within a computer memory element; and
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.” [sec(s) 4.1] “Each method was compared across the seven datasets using fivefold cross-validation and the cross-validated root-mean-squared error (RMSE) was recorded.”;)

further configuring the processor to:
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).”;)

wherein automatically improving the estimated probability distribution includes further configuring the processor to:
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.” [sec(s) 4.1] “Each method was compared across the seven datasets using fivefold cross-validation and the cross-validated root-mean-squared error (RMSE) was recorded.”;)

(iii) randomly sort samples of the plurality of samples within the computer memory element to define a training set and a holdout set complementary to the training set; and
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.”;)

(vi) designate, within the computer memory element, the proposed changed ensemble of binary decision trees as the new base ensemble of binary decision trees; and
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.”;)

The combination of CHIPMAN, Hernández is combinable with Hernández for the same rationale as set forth above with respect to claim 1.

Regarding claim 3
The combination of CHIPMAN, Hernández teaches claim 2.

CHIPMAN further teaches
wherein respective randomized binary decision trees of the base ensemble thereof have a number of decision layers that is influenced by a pre-determined maximum number of samples allowed within leaf nodes of the randomized binary decision trees, the method further including configuring [the processor] to:
(CHIPMAN [sec(s) 2.1] “The decision rules are binary splits of the predictor space of the form {x ∈ A} vs {x /∈ A} where A is a subset of the range of x. These are typically based on the single components of x = (x1,...,xp) and are of the form {xi ≤ c} vs {xi > c} for continuous xi.” [sec(s) 2.2.2] “For p(Tj), the form recommended by CGM98 is easy to specify and dovetails nicely with calculations for the backfitting MCMC algorithm described later in Section 3.1. It is specified by three aspects: (i) the probability that a node at depth d (= 0, 1, 2, ...) is nonterminal, given by α(1 + d)−β (7) , α ∈ (0, 1), β ∈ [0,∞), (ii) the distribution on the splitting variable assignments at each interior node, and (iii) the distribution on the splitting rule assignment in each interior node, conditional on the splitting variable. For (ii) and (iii) we use the simple defaults used by CGM98, namely, the uniform prior on available variables for (ii) and the uniform prior on the discrete set of available splitting values for (iii). Although not strictly coherent from the Bayesian point of view, this last choice has the appeal of invariance under monotone transformations of the splitting variables.” [sec(s) 6] “Indeed, in a separate experiment using m = 50 trees, we found that for n = 100, BART trees had up to 4 terminal nodes with an average size of 2.52 terminal nodes, whereas for n = 10,000, BART trees had as many as 10 terminal nodes with an average size of 3.34. In contrast, the short version BART effectively keeps tree sizes small by limiting iterations, so that its execution time scales linearly with n.”; e.g., “the probability that a node at depth d (= 0, 1, 2, ...) is nonterminal”, “for n = 100, BART trees had up to 4 terminal nodes with an average size of 2.52 terminal nodes, whereas for n = 10,000, BART trees had as many as 10 terminal nodes with an average size of 3.34” along with data samples read(s) on “a pre-determined maximum number of samples allowed within leaf nodes of the randomized binary decision trees”.)

(viii) repeat steps (ii) - (vi) wherein the number of decision layers is influenced by a reduced maximum number of samples allowed within the leaf nodes of the binary decision trees, such that the proposed changed ensemble of randomized binary decision trees has a greater number of decision layers than the number of decision layers in the base ensemble of randomized binary decision trees.
(CHIPMAN [sec(s) 2.1] “The decision rules are binary splits of the predictor space of the form {x ∈ A} vs {x /∈ A} where A is a subset of the range of x. These are typically based on the single components of x = (x1,...,xp) and are of the form {xi ≤ c} vs {xi > c} for continuous xi.” [sec(s) 2.2.2] “For p(Tj), the form recommended by CGM98 is easy to specify and dovetails nicely with calculations for the backfitting MCMC algorithm described later in Section 3.1. It is specified by three aspects: (i) the probability that a node at depth d (= 0, 1, 2, ...) is nonterminal, given by α(1 + d)−β (7) , α ∈ (0, 1), β ∈ [0,∞), (ii) the distribution on the splitting variable assignments at each interior node, and (iii) the distribution on the splitting rule assignment in each interior node, conditional on the splitting variable. For (ii) and (iii) we use the simple defaults used by CGM98, namely, the uniform prior on available variables for (ii) and the uniform prior on the discrete set of available splitting values for (iii). Although not strictly coherent from the Bayesian point of view, this last choice has the appeal of invariance under monotone transformations of the splitting variables.” [sec(s) 6] “However, for the longer default version of BART, this dependence becomes quadratic as is evidenced in Figure 11(b). Apparently, this nonlinear dependence is due to the adaptive nature of BART. For larger n, BART iterations tend toward the use of larger trees to exploit finer structure, and these larger trees require more tree-based operations to generate the predictions required for residual and likelihood evaluation. Indeed, in a separate experiment using m = 50 trees, we found that for n = 100, BART trees had up to 4 terminal nodes with an average size of 2.52 terminal nodes, whereas for n = 10,000, BART trees had as many as 10 terminal nodes with an average size of 3.34. In contrast, the short version BART effectively keeps tree sizes small by limiting iterations, so that its execution time scales linearly with n.”;)

Hernández further teaches
the method further including configuring the processor to:
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).”;)

The combination of CHIPMAN, Hernández is combinable with Hernández for the same rationale as set forth above with respect to claim 1.

Regarding claim 4
The combination of CHIPMAN, Hernández teaches claim 3.

CHIPMAN further teaches
further including configuring [the processor] to:
(ix) recursively repeat step (viii) until the computed new generalization error is smaller than the designated base generalization error for a pre-determined number of iterations, thereby increasing the number of decision layers in the base ensemble of randomized binary decision trees until an optimized generalization error is reached.
(CHIPMAN [sec(s) 1] “To fit the sum-of-trees model, BART uses a tailored version of Bayesian backfitting MCMC [Hastie and Tibshirani (2000)] that iteratively constructs and fits successive residuals. Although similar in spirit to the gradient boosting approach of Friedman (2001), BART differs in both how it weakens the individual trees by instead using a prior, and how it performs the iterative fitting by instead using Bayesian backfitting on a fixed number of trees.” [sec(s) 2.1] “Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T .” [sec(s) 5.1] “Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE. … the number of burn-in steps and MCMC iterations used were determined by inspection of a single long run. Typically, 200 burn-in steps and 1000 iterations were used.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters. … BART-cv tended to more often obtain smaller RMSE than any of its competitors.” [sec(s) 3.1] “The moves and their associated proposal probabilities are as follows: growing a terminal node (0.25), pruning a pair of terminal nodes (0.25), changing a nonterminal rule (0.40), and swapping a rule between parent and child (0.10).” [sec(s) 6] “BART iterations tend toward the use of larger trees to exploit finer structure, and these larger trees require more tree-based operations to generate the predictions required for residual and likelihood evaluation. Indeed, in a separate experiment using m = 50 trees, we found that for n = 100, BART trees had up to 4 terminal nodes with an average size of 2.52 terminal nodes, whereas for n = 10,000, BART trees had as many as 10 terminal nodes with an average size of 3.34.”;)

Hernández further teaches
further including configuring the processor to:
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).”;)

The combination of CHIPMAN, Hernández is combinable with Hernández for the same rationale as set forth above with respect to claim 1.

Regarding claim 5
The combination of CHIPMAN, Hernández teaches claim 2.

CHIPMAN further teaches
further including:
before respectively designating the proposed changed ensemble of binary decision trees as the base ensemble of binary decision trees, configuring the [processor] to store the base ensemble of binary decision trees as elements of an entry in a historical database within the [computer memory] element; the historical database configured to retain a pre-determined number of entries; and
(CHIPMAN [sec(s) 1] “To fit the sum-of-trees model, BART uses a tailored version of Bayesian backfitting MCMC [Hastie and Tibshirani (2000)] that iteratively constructs and fits successive residuals. Although similar in spirit to the gradient boosting approach of Friedman (2001), BART differs in both how it weakens the individual trees by instead using a prior, and how it performs the iterative fitting by instead using Bayesian backfitting on a fixed number of trees. Conceptually, BART can be viewed as a Bayesian nonparametric approach that fits a parameter rich model using a strongly influential prior distribution. Inferences obtained from BART are based on successive iterations of the backfitting algorithm which are effectively an MCMC sample from the induced posterior over the sum-of-trees model space. A single posterior mean estimate of f(x)=E(Y|x) at any input value x is obtained by a simple average of these successive sum-of-trees model draws evaluated at x.” [sec(s) 8] “Posterior calculation is carried out by a tailored backfitting MCMC algorithm that appears to converge quickly, effectively obtaining a (dependent) sample from the posterior distribution over the space of sum-of-trees models. A variety of inferential quantities of interest can be obtained directly from this sample.”;)

configuring [the processor] to respectively designate the elements of a selected entry as the base ensemble of binary decision trees, thereby returning the probability model to a previously estimated state for further optimization therefrom.
(CHIPMAN [sec(s) 1] “Although similar in spirit to the gradient boosting approach of Friedman (2001), BART differs in both how it weakens the individual trees by instead using a prior, and how it performs the iterative fitting by instead using Bayesian backfitting on a fixed number of trees. Conceptually, BART can be viewed as a Bayesian nonparametric approach that fits a parameter rich model using a strongly influential prior distribution. Inferences obtained from BART are based on successive iterations of the backfitting algorithm which are effectively an MCMC sample from the induced posterior over the sum-of-trees model space. A single posterior mean estimate of f(x)=E(Y|x) at any input value x is obtained by a simple average of these successive sum-of-trees model draws evaluated at x.” [sec(s) 5.1] “Typically, 200 burn-in steps and 1000 iterations were used. For BART prediction at each x, we used the posterior mean estimates given by (18).” [sec(s) 8] “Posterior calculation is carried out by a tailored backfitting MCMC algorithm that appears to converge quickly, effectively obtaining a (dependent) sample from the posterior distribution over the space of sum-of-trees models. A variety of inferential quantities of interest can be obtained directly from this sample.” [sec(s) 3.1] “The draw of Tj in (15), although somewhat elaborate, can be obtained using the Metropolis–Hastings (MH) algorithm of CGM98. This algorithm proposes a new tree based on the current tree using one of four moves.”; e.g., “MCMC” along with “Metropolis–Hastings (MH) algorithm” read(s) on “returning the probability model to a previously estimated state”.)

Hernández further teaches
configuring the processor to store the base ensemble of binary decision trees as elements of an entry in a historical database within the computer memory element;
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.”;)

configuring the processor to respectively designate the elements of a selected entry as the base ensemble of binary decision trees.
(Hernández [sec(s) 4] “We compare BART-BMA to RF, ERT and BART for a number of simulated datasets and also for two real proteomic datasets for the diagnosis of cardiovascular disease and aggressive versus non-aggressive prostate cancer. All analyses reported in this section were computed using a HP Z420 Workstation with 32 GB RAM. To allow for comparison of the two most used software packages for BART, we use both the bartMachine R package (version 1.2.3; Kapelner and Bleich 2014b) and the dbarts R package (version 0.8-7; Chipman et al. 2014). RF was run using the randomForest R package (version 4.6-12; Liaw and Matthew 2015). ERT was also included as a benchmark tree ensemble method using R package ExtraTrees (version 1.0.5). All methods and results were obtained using R version 3.2.0 (https://cran.r-project.org/bin/windows/base/old/ 3.2.0/).” [sec(s) 3] “We define BART-BMA as an ensemble of BART models which uses a sum of Bayesian decision trees as its base learner.” [sec(s) 2.3] “This is because the set of trees T for each iteration of each MCMC chain must be saved to memory to enable prediction of an external dataset if it is not provided at the time of training the model.”;)

The combination of CHIPMAN, Hernández is combinable with Hernández for the same rationale as set forth above with respect to claim 1.

Regarding claim 6
The combination of CHIPMAN, Hernández teaches claim 2.

wherein proposing a base ensemble of randomized binary decision trees includes: (See claim 2)

CHIPMAN further teaches
(a) defining a binary decision tree having a root node, a plurality of branches and a plurality of decision nodes corresponding to the plurality of branches, the decision nodes including a plurality of intermediate decision nodes and a plurality of leaf nodes, the branches initially radiating from the root node and mutually connected by the intermediate decision nodes, pairs of the branches corresponding to pairs of opposing evaluations of respective inequalities instructive of a comparison between any sample from the training set and a random threshold value assigned to the given pair of branches;
(CHIPMAN [sec(s) 1] “In this paper we propose a Bayesian approach called BART (Bayesian Additive Regression Trees) which uses a sum of trees to model or approximate f(x)= E(Y|x).” [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 2.1] “The decision rules are binary splits of the predictor space of the form {x ∈ A} vs {x /∈ A} where A is a subset of the range of x. These are typically based on the single components of x = (x1,...,xp) and are of the form {xi ≤ c} vs {xi > c} for continuous xi.” [sec(s) 2.2.2] “(ii) the distribution on the splitting variable assignments at each interior node, and (iii) the distribution on the splitting rule assignment in each interior node, conditional on the splitting variable. For (ii) and (iii) we use the simple defaults used by CGM98, namely, the uniform prior on available variables for (ii) and the uniform prior on the discrete set of available splitting values for (iii). Although not strictly coherent from the Bayesian point of view, this last choice has the appeal of invariance under monotone transformations of the splitting variables”; e.g., “splitting rule” read(s) on “inequalities”. In addition, e.g., “splitting values” along with “Bayesian” read(s) on “random threshold value” since the values are selected based on a prior distribution.)

(b) assigning a given training sample from the training set to individual leaf nodes of the plurality of leaf nodes by passing the given training sample from the root node along selected branches determined by the evaluations of the respective inequalities for the given training sample at successive selected branches;
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(c) repeating step (b) for each sample in the training set; and
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node.” [sec(s) 1] “We consider the fundamental problem of making inference about an unknown function f that predicts an output Y using a p dimensional vector of inputs x = (x1,...,xp) when (1) 
    PNG
    media_image2.png
    60
    663
    media_image2.png
    Greyscale
. To do this, we consider modeling or at least approximating f(x) = E(Y|x), the mean of Y given x, by a sum of m regression trees f (x) ≈ h(x) ≡ ∑mj=1 gj(x) where each gj denotes a regression tree. Thus, we approximate (1) by a sum-of-trees model (2) 
    PNG
    media_image3.png
    83
    954
    media_image3.png
    Greyscale
.” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(d) repeating steps (b) and (c) until a pre-determined number of decision trees has been met, thereby producing a base ensemble of randomized binary decision trees.
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

Regarding claim 7
The combination of CHIPMAN, Hernández teaches claim 6.

CHIPMAN further teaches
wherein evaluating a base generalization error includes:
(CHIPMAN [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.”; e.g., “cross-validation” read(s) on “generalization error”.)

(a) computing respective point estimates for each leaf node of the plurality of leaf nodes of a given randomized binary decision tree based on the samples from the training set assigned to the leaf node;
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 2.1] “Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T.” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(b) assigning a given test sample from the holdout set to individual leaf nodes of the plurality of leaf nodes by passing the given test sample from the root node along selected branches determined by evaluations of the respective inequalities for the given test sample at successive selected branches;
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(c) repeating steps (a) and (b) for each test sample in the holdout set;
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(d) repeating steps (a) through (c) for each randomized binary decision tree in the base ensemble thereof, and
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(e) computing a sum, over each randomized binary decision tree in the base ensemble thereof, of squared differences between a first value and a second value, the first value being a probability of having correctly, according to output values of samples of the holdout set, assigned the samples of the holdout set to individual leaf nodes based on input values of the holdout set, the second value being unity.
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 4] “However, for binary Y (= 0 or 1), it is straightforward to extend BART to the probit model setup for classification (21) p(x) ≡ P[Y = 1|x] = Φ[G(x)], where (22) 
    PNG
    media_image5.png
    131
    455
    media_image5.png
    Greyscale
 and Φ[·] is the standard normal c.d.f. Note that each classification probability p(x) here is obtained as a function of G(x), our sum of regression trees.”; e.g., “Y = 1” read(s) on “unity”.)

Regarding claim 8
The combination of CHIPMAN, Hernández teaches claim 6.

wherein proposing a changed ensemble of randomized binary decision trees includes: (See claim 2)

CHIPMAN teaches
(a) defining a change to a randomly selected random threshold value of a given pair of branches;
(CHIPMAN[sec(s) 1] “In this paper we propose a Bayesian approach called BART (Bayesian Additive Regression Trees) which uses a sum of trees to model or approximate f(x)= E(Y|x).” [sec(s) 2.2.2] “(ii) the distribution on the splitting variable assignments at each interior node, and (iii) the distribution on the splitting rule assignment in each interior node, conditional on the splitting variable. For (ii) and (iii) we use the simple defaults used by CGM98, namely, the uniform prior on available variables for (ii) and the uniform prior on the discrete set of available splitting values for (iii). Although not strictly coherent from the Bayesian point of view, this last choice has the appeal of invariance under monotone transformations of the splitting variables.” [sec(s) 3.1] “The draw of Tj in (15), although somewhat elaborate, can be obtained using the Metropolis–Hastings (MH) algorithm of CGM98. This algorithm proposes a new tree based on the current tree using one of four moves. The moves and their associated proposal probabilities are as follows: growing a terminal node (0.25), pruning a pair of terminal nodes (0.25), changing a nonterminal rule (0.40), and swapping a rule between parent and child (0.10). Although the grow and prune moves change the number of terminal nodes, by integrating out Mj in (14), we avoid the complexities associated with reversible jumps between continuous spaces of varying dimensions [Green (1995)].”; e.g., “splitting values” along with “Bayesian” read(s) on “random threshold value” since the values are selected based on a prior distribution. In addition, e.g., “changing a nonterminal rule” along with “Bayesian” and “distribution on the splitting rule assignment in each interior node, conditional on the splitting variable” read(s) on “randomly selected”.)

(b) assigning a given training sample from the new training set to individual leaf nodes of the plurality of leaf nodes by passing the given training sample from the root node along selected branches determined by the evaluations of the respective inequalities for the given training sample at successive selected branches;
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(c) repeating step (b) for each sample in the training set, and
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node.” [sec(s) 1] “We consider the fundamental problem of making inference about an unknown function f that predicts an output Y using a p dimensional vector of inputs x = (x1,...,xp) when (1) 
    PNG
    media_image2.png
    60
    663
    media_image2.png
    Greyscale
. To do this, we consider modeling or at least approximating f(x) = E(Y|x), the mean of Y given x, by a sum of m regression trees f (x) ≈ h(x) ≡ ∑mj=1 gj(x) where each gj denotes a regression tree. Thus, we approximate (1) by a sum-of-trees model (2) 
    PNG
    media_image3.png
    83
    954
    media_image3.png
    Greyscale
.” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(d) repeating steps (b) and (c) until the pre-determined number of decision trees has been met, thereby producing a changed ensemble of randomized binary decision trees.
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

Regarding claim 9
The combination of CHIPMAN, Hernández teaches claim 8.

CHIPMAN further teaches
wherein evaluating a new generalization error includes:
(CHIPMAN [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 5] “We begin in Section 5.1 with a predictive cross-validation performance comparison of BART with competing methods on 42 different real data sets.” [sec(s) 5.3] “For BART-cv, we used 5-fold cross-validation to choose from among k = 0.25, 0.5, 1, 2, 3 and m = 100, 200, 400 or 800. For all the competitors, we also used 5-fold cross-validation to select tuning parameters as in Section 5.1. However, the large number of predictors led to some different ranges of tuning parameters.”; e.g., “cross-validation” read(s) on “generalization error”.)

(a) computing respective point estimates for each leaf node of the plurality of leaf nodes of a given randomized binary decision tree based on the samples from the new training set assigned to the leaf node;
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 2.1] “Let T denote a binary tree consisting of a set of interior node decision rules and a set of terminal nodes, and let M = {μ1,μ2,...,μb} denote a set of parameter values associated with each of the b terminal nodes of T.” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(b) assigning a given test sample from the new holdout set to individual leaf nodes of the plurality of leaf nodes by passing the given test sample from the root node along selected branches determined by evaluations of the respective inequalities for the given test sample at successive selected branches;
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(c) repeating steps (a) and (b) for each test sample in the new holdout set;
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(d) repeating steps (a) through (c) for each randomized binary decision tree in the proposed changed ensemble thereof, and   
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.”;)

(e) computing a sum, over the leaf nodes of each randomized binary decision tree in the proposed changed ensemble thereof, of squared differences between a first value and a second value, the first value being a probability of having correctly, according to output values of samples of the new holdout set, assigned the samples of the new holdout set to individual leaf nodes based on input values of the holdout set, the second value being unity.
(CHIPMAN [sec(s) 2.1] “Each x value is associated with a single terminal node of T by the sequence of decision rules from top to bottom, and is then assigned the μi value associated with this terminal node. For a given T and M, we use g(x; T,M) to denote the function which assigns a μi ∈ M to x. Thus, (3) 
    PNG
    media_image4.png
    83
    1120
    media_image4.png
    Greyscale
 is a single tree model of the form considered by CGM98. Under (3), the conditional mean of Y given x, E(Y|x) equals the terminal node parameter μi assigned by g(x; T,M). With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 5.1] “For each of the 42 data sets, we created 20 independent train/test splits by randomly selecting 5/6 of the data as a training set and the remaining 1/6 as a test set. Thus, 42 × 20 = 840 test/train splits were created. Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE.” [sec(s) 4] “However, for binary Y (= 0 or 1), it is straightforward to extend BART to the probit model setup for classification (21) p(x) ≡ P[Y = 1|x] = Φ[G(x)], where (22) 
    PNG
    media_image5.png
    131
    455
    media_image5.png
    Greyscale
 and Φ[·] is the standard normal c.d.f. Note that each classification probability p(x) here is obtained as a function of G(x), our sum of regression trees.”; e.g., “Y = 1” read(s) on “unity”.)

Regarding claim 10
The combination of CHIPMAN, Hernández teaches claim 6.

CHIPMAN further teaches
further including computing a measure of variability based on the samples assigned to the leaf nodes of the base ensemble of binary decision trees.
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 2.2.3] “To guide the specification of the hyperparameters μμ and σμ, note that E(Y|x) is the sum of mμij’s under the sum-of-trees model, and because the μij’s are apriori i.i.d., the induced prior on E(Y|x) is N(mμμ, mσ2μ). Note also that it is highly probable that E(Y|x) is between ymin and ymax, the observed minimum and maximum of Y in the data. The essence of our strategy is then to choose μμ and σμ so that N(mμμ,mσ2μ) assigns substantial probability to the interval (ymin,ymax). This can be conveniently done by choosing μμ and σμ so that mμμ − k                
                    
                        m
                    
                
            σμ = ymin and mμμ + k                
                    
                        m
                    
                
            σμ = ymax for some preselected value of k. For example, k = 2 would yield a 95% prior probability that E(Y|x) is in the interval (ymin,ymax). The strategy above uses an aspect of the observed data, namely, ymin and ymax, to try to ensure that the implicit prior for E(Y|x) is in the right “ballpark.” That is to say, we want it to assign substantial probability to the entire region of plausible values of E(Y|x) while avoiding overconcentration and overdispersion.” [sec(s) 2.2.4] “we take σˆ as the residual standard deviation from a least squares linear regression of Y on the original X’s.” [sec(s) 5.1] “Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE”;)

Regarding claim 11
The combination of CHIPMAN, Hernández teaches claim 10.

CHIPMAN further teaches
wherein the measure of variability is at least one of variance, standard deviation, range, and interquartile range.
(CHIPMAN [sec(s) 2] “With this notation, the sum-of-trees model (2) can be more explicitly expressed as (4) 
    PNG
    media_image1.png
    208
    1311
    media_image1.png
    Greyscale
, where for each binary regression tree Tj and its associated terminal node parameters Mj, g(x; Tj,Mj) is the function which assigns μij ∈ Mj to x. Under (4), E(Y|x) equals the sum of all the terminal node μij’s assigned to x by the g(x; Tj,Mj)’s. When the number of trees m > 1, each μij here is merely a part of E(Y|x), unlike the single tree model (3).” [sec(s) 2.2.3] “To guide the specification of the hyperparameters μμ and σμ, note that E(Y|x) is the sum of mμij’s under the sum-of-trees model, and because the μij’s are apriori i.i.d., the induced prior on E(Y|x) is N(mμμ, mσ2μ). Note also that it is highly probable that E(Y|x) is between ymin and ymax, the observed minimum and maximum of Y in the data. The essence of our strategy is then to choose μμ and σμ so that N(mμμ,mσ2μ) assigns substantial probability to the interval (ymin,ymax). This can be conveniently done by choosing μμ and σμ so that mμμ − k                
                    
                        m
                    
                
            σμ = ymin and mμμ + k                
                    
                        m
                    
                
            σμ = ymax for some preselected value of k. For example, k = 2 would yield a 95% prior probability that E(Y|x) is in the interval (ymin,ymax). The strategy above uses an aspect of the observed data, namely, ymin and ymax, to try to ensure that the implicit prior for E(Y|x) is in the right “ballpark.” That is to say, we want it to assign substantial probability to the entire region of plausible values of E(Y|x) while avoiding overconcentration and overdispersion.” [sec(s) 2.2.4] “we take σˆ as the residual standard deviation from a least squares linear regression of Y on the original X’s.” [sec(s) 5.1] “Based on each training set, each method was then used to predict the corresponding test set and evaluated on the basis of its predictive RMSE”;)

Regarding claim 12
The claim is a system claim corresponding to the method claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 13
The claim is a system claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 14
The claim is a system claim corresponding to the method claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 15
The claim is a system claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 16
The claim is a system claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 17
The claim is a system claim corresponding to the method claim 6, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 18
The claim is a system claim corresponding to the method claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 19
The claim is a system claim corresponding to the method claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 20
The claim is a system claim corresponding to the method claim 9, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Fri 9:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/SEHWAN KIM/Examiner, Art Unit 2129                                                                                                                                                                                                        2/13/2026
Read full office action
Prosecution Timeline

Jul 19, 2022
Application Filed
Feb 13, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

15/360,454
Patent 12602595
SYSTEM AND METHOD OF USING A KNOWLEDGE REPRESENTATION FOR FEATURES IN A MACHINE LEARNING CLASSIFIER
2y 5m to grant Granted Apr 14, 2026
16/453,380
Patent 12602580
Dataset Dependent Low Rank Decomposition Of Neural Networks
2y 5m to grant Granted Apr 14, 2026
17/098,007
Patent 12602581
Systems and Methods for Out-of-Distribution Detection
2y 5m to grant Granted Apr 14, 2026
17/358,891
Patent 12602606
APPARATUSES, COMPUTER-IMPLEMENTED METHODS, AND COMPUTER PROGRAM PRODUCTS FOR IMPROVED GLOBAL QUBIT POSITIONING IN A QUANTUM COMPUTING ENVIRONMENT
2y 5m to grant Granted Apr 14, 2026
18/081,242
Patent 12541722
MACHINE LEARNING TECHNIQUES FOR VALIDATING AND MUTATING OUTPUTS FROM PREDICTIVE SYSTEMS
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+65.6%)
4y 1m
Median Time to Grant
Low
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allow rate.
Machine Learning of Probability Distributions Through a Generalization Error

This examiner grants 60% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email