Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the application and claims filed 05/02/2023. Claims 1-20 are
pending and have been examined. Claims 1-20 are rejected.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/02/2023 and 12/20/2024 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d).
The present application claims foreign priority based on Japanese Patent Application JP2020-193712
filed 11/20/2020. The examiner notes that a certified copy (in Japanese) of the above-noted application
was retrieved on 05/02/2023. Receipt is acknowledged of certified copies of papers required by 37 CFR
1.55.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed. The following title is suggested: “Apparatus and Method for Updating Ensemble Learning-Type Inference Models”
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are:
“an inferred output data generator configured to generate inferred output data” in claim 1, 19, and 20
“a first output data generator configured to generate first output data”, “a second output data generator configured to generate second output data”, “a final output data generator configured to generate final output data” in claim 11
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 1, 11, 12, 19, 20 and their respective dependents are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, 19 and 20, claim limitation “an inferred output data generator configured to generate inferred output data” invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, while the specification discloses an “Inference Processing Section 22” comprising a control section that executes inference processing steps to generate output data, the specification fails to explicitly use the claimed term “inferred output data generator” or clearly link this claimed generic placeholder to the specific processor and algorithms disclosed in the written description. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph.
Regarding claim 11, claim limitation “a first output data generator configured to generate first output data”, “a second output data generator configured to generate second output data”, “a final output data generator configured to generate final output data” invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Specifically, while the specification discloses an “Inference Processing Section 22” comprising a control section that executes inference processing steps to generate output data, the specification fails to explicitly use the claimed term “first output data generator”, “second output data generator,” or “final output data generator” or clearly link this claimed generic placeholder to the specific processor and algorithms disclosed in the written description. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph.
Applicant may:
(a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph;
(b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Regarding claim 12, Claim 12 recites “the first inference model” in the ending part of the claim. There is insufficient antecedent basis for this limitation in the claim. (Examiner’s note: Claim 12 depends on claim 2, however, claim 2 does not introduce “a first inference model”.)
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1
Step 1: Claim 1 is an apparatus type claim. Therefore, claim 1 is directed to the statutory category of machine.
Step 2A prong 1:
“an inferred output … generate inferred output data … performs inference based on each inference result by” (This limitation falls under mental processes grouping. A person can mentally or with the aid of a pen and paper process input data to come up with an inference without the need for a specific technological structure).
“using an update amount based on the inferred output data and the correct answer data.” (This limitation falls under mental processes grouping. A person can mentally or with a pen and paper can come up with an update amount based on two types of data).
Step 2A prong 2:
“An information processing apparatus, comprising:” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
“data acquiring processor circuitry configured to acquire section that acquires input data and correct answer data that corresponds to the input data;” (Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g)).
“data generator configured to” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: This element merely invoke generic computer components (“generator”) as tools to perform the recited abstract ideas and do not provide specific improvement to the computer functionality).
“of an ensemble learning-type inference model”, “to the ensemble learning-type inference model that “, “a plurality of inference models” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: These elements merely recite a generic off the shelf ensemble inference model as tools to perform the recited abstract ideas).
an additional learning processor configured to perform additional learning processing with respect to a part of or all of each of the inference models that constitute the ensemble learning-type inference model by (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: High level recitation of generic processor and inference model applied to an abstract idea without significantly more).
Step 2B
“An information processing apparatus, comprising:” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
“data acquiring processor circuitry configured to acquire section that acquires input data and correct answer data that corresponds to the input data;” (MPEP 2106.05(d)(II) indicate that merely "receiving data" is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed receiving/view data steps are well-understood, routine, conventional activity is supported under Berkheimer).
“data generator configured to” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: This element merely invoke generic computer components (“generator”) as tools to perform the recited abstract ideas and do not provide specific improvement to the computer functionality).
“of an ensemble learning-type inference model”, “to the ensemble learning-type inference model that “, “a plurality of inference models” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: These elements merely recite a generic off the shelf ensemble inference model as tools to perform the recited abstract ideas).
an additional learning processor configured to perform additional learning processing with respect to a part of or all of each of the inference models that constitute the ensemble learning-type inference model by (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: High level recitation of generic processor and inference model applied to an abstract idea without significantly more).
Claim 2
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 2 depends on.
Step 2A prong 2 and Step 2B:
“wherein the ensemble learning-type inference model is a boosting learning-type inference model which is constituted by a plurality of inference models formed by sequential learning so that each of the inference models reduces an inference error due to a higher-order inference model group.” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of applying a boosting inference model without significantly more).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 3
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 3 depends on.
Step 2A prong 2 and Step 2B:
“wherein the ensemble learning-type inference model is a bagging learning-type inference model which performs inference based on each inference result of a plurality of inference models, each formed by learning based on a plurality of data groups extracted from a same learning target data group.” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of applying a bagging inference model without significantly more).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 4
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 4 depends on. Claim 4 further recites:
“wherein the update amount is a value based on a difference between the inferred output data and the correct answer data.” (A person mentally or with a pen and paper can find the difference between two types of data).
Step 2 A prong 2 and Step 2B: The claim does not recite any additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception.
Claim 5
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 5 depends on. Claim 5 further recites:
“wherein the update amount is a value based on a value obtained by multiplying a difference between the inferred output data and the correct answer data by a learning rate.” (A person mentally or with a pen and paper can find the difference between two types of data and then multiply it with another value).
Step 2 A prong 2 and Step 2B: The claim does not recite any additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception.
Claim 6
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 6 depends on. Claim 6 further recites:
“wherein the update amount is a value calculated by dividing a value obtained by multiplying a difference between the inferred output data and the correct answer data by a learning rate by the number of inference models that constitute the ensemble learning-type inference model.” (A person mentally or with a pen and paper can find the difference between two types of data and then multiply it with another value and then divide it by the total number of models in the group).
Step 2A prong 2 and Step 2B: The claim does not recite any additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception.
Claim 7
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 7 depends on.
Step 2 A prong 2 and Step 2B:
“wherein the inference model is a trained decision tree.” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of applying a generic trained decision tree without significantly more).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 8
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 7 above, which claim 8 depends on. Claim 8 further recites:
“accumulating the update amount with respect to an inferred output” (a person mentally or with a pen and paper keeps a running total by manually adding the update amount to the existing prediction value).
Step 2A prong 2 and Step 2B:
“wherein the additional learning processing is processing of” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of apply learning processing without significantly more).
“of a decision tree which constitutes each of the inference models.” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of applying a decision tree without significantly more).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 9
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 9 depends on.
Step 2A prong 2 and Step 2B:
“wherein the inference model is a trained neural network.” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of applying a generic trained neural network without significantly more).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 10
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 9 above, which claim 10 depends on. Claim 10 further recites:
updating … by back-propagating the update amount. (This limitation falls under mathematical concepts. Updating by back-propagating is essentially performing a series of mathematical calculations to distribute an error value backward through a formula to adjust its numerical weights.)
Step 2A prong 2 and Step 2B:
“wherein the additional learning processing is processing of” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)) – Examiner’s note: This element merely invoke generic components of computing (“processing”) as tools to perform the recited abstract ideas and do not provide specific improvement to the computer functionality).
“with respect to a neural network which constitutes each of the inference models, a parameter of the neural network” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f) – Examiner’s note: high level recitation of applying a generic neural network without significantly more).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 11
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 2 above, which claim 11 depends on. Claim 11 further recites:
“Generate first output data by inputting the input data to a first approximate function generated based on training input data and training correct answer data that corresponds to the training input data” (This falls under the mathematical concepts grouping. Manipulating data using mathematical functions i.e. approximate function to generate new data constitutes mathematical calculations.)
“generate second output data by inputting the input data … based on the training input data and difference data between output data generated by inputting the training input data to the first approximate function and the training correct answer data;” (This falls under the mathematical concepts grouping. Manipulating data using mathematical functions i.e. approximate function to generate new data constitutes mathematical calculations. Additionally, the specification discloses expressions 3-5 which provide the specific formulas used to calculate this error and train the secondary model.)
Generate final output data based on the first output data and the second output data … an update amount based on difference data between the correct answer data and the first output data and the inferred output data. (a person mentally or with a pen and paper can come up with a final value based on a first and second data and find an update amount by taking the difference between other types of data.)
Step 2A prong 2 and Step 2B:
“wherein the boosting learning-type inference model further includes a first inference model, the first inference model comprising:” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
“a first output data generator configured to”, “a second output data generator configured to, “a final output data generator configured to” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
“to a second trained model generated by performing machine learning”, “wherein the additional learning processing is processing of updating the second trained model using” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 12
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 2 above, which claim 12 depends on. Claim 12 further recites:
“equal to or lower than a predetermined … as the first” (a person can mentally or with a pen a paper can identify something at or below a certain specific rank as the first)
Step 2A prong 2 and Step 2B:
“wherein in the boosting learning-type inference model, only inference models” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
“inference model are configured” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 13
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 11 above, which claim 13 depends on. Claim 13 further recites:
“wherein the first approximate function” (Mathematical concept. See claim 11.)
Step 2A prong 2 and Step 2B:
“is a first trained model generated by performing machine learning based on the training input data and the training correct answer data.” (The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This additional element does not integrate the judicial exception into a practical application (MPEP 2106.05(h)).
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 14
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 11 above, which claim 14 depends on. Claim 14 further recites:
“wherein the first approximate function is a function obtained by formulating a relationship between the training input data and the training correct answer data.” (Numerical formula, function, or equation is considered as failing within the mathematical concepts grouping. Formulating a relationship between training data relies on construction mathematical equation or formula.)
Step 2A prong 2 and Step 2B: The claim does not recite any additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception.
Claim 15
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 15 depends on. Claim 15 further recites:
“convert, when the correct answer data is a label, the label into a numerical value.“ (A person mentally or with a pen and paper can look at a text label and translate it by assigning a number.)
Step 2A prong 2 and Step 2B:
“further comprising a conversion processor configured to” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 16
Step 1: A machine, as above.
Step 2A prong 1: See the rejection of Claim 1 above, which claim 16 depends on.
Step 2A prong 2 and Step 2B:
“wherein the additional learning processing is online learning.” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Therefore, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, as discussed above in the rejection of claim 1. The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception.
Claim 17 is a method claim that recite substantially the same limitations as claim 1, therefore claim 17 is rejected under the same rationale as claim 1.
Claim 18
Step 1: Claim 18 recites a non-transitory computer-readable medium, therefore, it is directed to the statutory category of manufacture.
Step 2A prong 1: See the rejection for claim 17, which claim 18 depends on.
Step 2A prong 2 & Step 2B:
“having one or more executable instructions stored thereon causing a computer to function as an information processing device which, when executed by processor circuitry, cause the processor circuitry to perform” (Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f))
Claim 18 further recites: “the information processing method according to claim 17 for information processing device”
See the rejection for claim 17 which claim 18 depends on.
Claim 19 is a system claim that recite substantially the same limitations as claim 1, therefore claim 19 is rejected under the same rationale as claim 1.
Claim 20 is an apparatus claim that recite “control apparatus for controlling a target apparatus, the control apparatus”, “from the target apparatus”. These fall under additional elements under step 2A prong 2 and Step 2B of adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f)).
The remaining portions of claim 20 recite substantially the same limitations as claim 1, therefore claim 20 is rejected under the same rationale as claim 1.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 4-5, 9-10, and 16-19 are rejected under 35 U.S.C. 103 as being unpatentable over non-patent literature Jacobs et al. (“Adaptive Mixtures of Local Experts”, hereinafter “Jacobs”) in view of patent application US 2018/0137427 A1 Hsieh et al., hereinafter “Hsieh”.
Claim 1
Jacobs teaches:
An information processing …, (page 79, “We present a new supervised learning procedure for systems composed of many separate networks…”) comprising: … input data and correct answer data that corresponds to the input data; (Page 80, “d^c is the desired output vector in case c.” – EN: this denotes case c which is the training case used to train the networks and d^c is the desired output vector in case c.) an inferred output … inferred output data of an ensemble learning-type inference model by inputting the input data to the ensemble learning-type inference model that performs inference based on each inference result by a plurality of inference models; (Page 80, “the final output of the whole system is a linear combination of the outputs of the local experts, with the gating network determining the proportion of each local output in the linear combination” EN: this denotes an ensemble type model that generates a final system output by blending or combining the individual predictions of multiple local expert networks). …with respect to a part of or all of each of the inference models that constitute the ensemble learning-type inference model by using an update amount based on the inferred output data and the correct answer data.
PNG
media_image1.png
111
1046
media_image1.png
Greyscale
(page 80, “This error measure compares the desired output with a blend of the outputs of the local experts, so, to minimize the error, each local expert must make its output cancel the residual error that is left by the combined effects of all the other experts. When the weights in one expert change, the residual error changes, and so the error derivatives for all the other local experts change. – EN: this denotes calculating a global error by taking the difference between the target answer and the combined ensemble output. It further denotes updating the weights of the individual expert models using error derivatives (the update amount) that are directly based on minimizing this global residual error.)
Jacobs does not explicitly disclose:
“apparatus”, “a data acquiring processor circuitry configured to acquire”, “data generator configured to generate”, “and an additional learning processor configured to perform additional learning processing”
However, Hsieh teaches:
“apparatus” (Para 2, “The disclosure relates in general to an ensemble learning prediction apparatus”)
“a data acquiring processor circuitry configured to acquire” (Para 6, “According to one embodiment, an ensemble learning prediction apparatus is provided. The apparatus comprises a loss module receiving a sample data” -- Examiner’s Note (EN): FIG. 2 also illustrates that these modules reside and operate within a processor.) “data generator configured to generate” (FIG. 2, EN: Fig. 2 illustrates the generation of a prediction “output” from the weighting module of the ensemble learning prediction apparatus) “and an additional learning processor configured to perform additional learning processing”
(Para 3, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time” Para 21, “The adaptive ensemble weight can be adapted to the current environment to resolve the concept drifting problem.”
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the teachings of Jacobs with the teachings of Hsieh. Specifically, it would have been obvious to integrate Jacobs’ method of updating the internal parameters of individual base models using an error measure based on the combined ensemble output and the correct answer with Hsieh’s hardware and ensemble learning apparatus configured to perform continuous, additional learning. The motivation for doing so would be to enable the individual base models (the local experts) to continuously adapt in response to changing environments (concept drift). See Para 3 in Hsieh, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time…”
Claim 4
Jacobs in view of Hsieh teaches all the limitations of claim 1, Jacobs further teaches:
wherein the update amount is a value based on a difference between the inferred output data and the correct answer data. (Page 80 equation 1.1)
PNG
media_image2.png
183
1046
media_image2.png
Greyscale
Claim 5
Jacobs in view of Hsieh teaches all the limitations of claim 1, Jacobs further teaches:
wherein the update amount is a value based on a value obtained by multiplying a difference between the inferred output data and the correct answer data by a learning rate. (Page 80 equation 1.1 and Page 85, “All simulations were performed using a simple gradient descent algorithm with fixed step size t” – EN: this denotes using a fixed step size in a gradient descent algorithm which is synonymous with a learning rate. When a gradient descent algorithm is applied to the error function in equation 1.1, the mathematical derivative extracts the difference term. The algorithm then multiplies this gradient by the step size to calculate the final weight.)
Claim 9
Jacobs in view of Hsieh teaches all the limitations of claim 1, Jacobs further teaches:
wherein the inference model is a trained neural network. (Page 81, “Figure 1: A system of expert and gating networks. Each expert is a feedforward network and all experts receive the same input and have the same number of outputs.”)
Claim 10
Jacobs in view of Hsieh teaches all the limitations of claim 9, Jacobs further teaches:
…updating, with respect to a neural network which constitutes each of the inference models, a parameter of the neural network by back-propagating the update amount. (Pages 79, 82, and 85 -- EN: Jacobs describes calculating an error derivative (an update amount) based on the output and updating the network’s parameters (weights) using gradient descent and back propagation. The NPL discusses the use of back-propagation directly “If backpropagation is used to train a single, multilayer network (page 79).” The NPL further details calculating the error derivatives (the update amount) with respect to the output of an expert to adapt the weights, noting that “it is helpful to compare the derivatives of the two error functions with respect to the output of an expert. (page 82)” Finally, the NPL confirms that these gradients are used to perform the weight updates, stating that “All simulations were performed using a simple gradient descent algorithm (page 85)”)
Jacobs does not explicitly disclose:
wherein the additional learning processing is processing of…
However, Hsieh teaches:
wherein the additional learning processing is processing of… (Para 3, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time” Para 21, “The adaptive ensemble weight can be adapted to the current environment to resolve the concept drifting problem.”
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the teachings of Jacobs with the teachings of Hsieh. Specifically, it would have been obvious to integrate Jacobs’ method of updating the parameters of the neural network through backpropagation with Hsieh’s hardware and ensemble learning apparatus configured to perform continuous, additional learning. The motivation for doing so would be to enable the individual base models (the local experts) to continuously adapt in response to changing environments (concept drift). See Para 3 in Hsieh, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time…”
Claim 16
Jacobs in view of Hsieh teaches all the limitations of claim 1, Hsieh further teaches:
wherein the additional learning processing is online learning. (Para 17, “To avoid the online learning sample data being over-trained and losing the required diversity between the basic hypotheses of the ensemble learning model, the diversity of hypotheses is considered during the learning process.” Para 21, “{x.sub.n.sup.(t)}, n=1,2, . . . N denotes an online learning sample data in the t-th block”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the ensemble learning network of Jacobs to perform the additional learning as online learning as taught by Hsieh. The motivation for doing so would be to enable the individual base models (the local experts) to continuously adapt in response to changing environments (concept drift). See Para 3 in Hsieh, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time…”
Claim 17
Jacobs teaches:
An information processing method, comprising: (Page 79, “We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases.”)
The remaining limitations of claim 17 are substantially the same as claim 1, therefore claim 17 is rejected under the same rationale as claim 1.
Claim 18
Jacobs in view of Hsieh teaches all the limitations of claim 17, Hsieh further teaches:
A non-transitory computer-readable medium (Para 8, “The non-transitory computer-readable storage medium provided in the present disclosure can execute the abovementioned method.”) having one or more executable instructions stored thereon (Para 38, “stored in a computer program product having instructions allocated to a computing device”) causing a computer to function as an information processing device which, (Para 38, “allocated to a computing device”, Para 6, “According to one embodiment, an ensemble learning prediction apparatus is provided.”) when executed by processor circuitry, cause the processor circuitry to perform the information processing method (Para 38, “allocated to a computing device executing the abovementioned method.” Para 8, “The non-transitory computer-readable storage medium provided in the present disclosure can execute the abovementioned method.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the combine the ensemble learning network of Jacobs with the non-transitory computer-readable medium of Hsieh. The motivation for doing so would be to allow the ensemble learning method to be implemented on a functional component. See para 38 of Hsieh, “In some embodiments, the non-transitory computer-readable storage medium can be stored in a computer program product having instructions allocated to a computing device executing the abovementioned method.”
Claim 19
Hsieh teaches:
An information processing system (Para 2, “The disclosure relates in general to an ensemble learning prediction apparatus”)
The remaining limitations of claim 19 are substantially the same as claim 1, therefore claim 19 is rejected under the same rationale as claim 1.
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the combine the ensemble learning network of Jacobs with the apparatus of Hsieh. The motivation for doing so would be to allow the ensemble learning method to be implemented on a functional component. See para 6 in Hsieh, “According to one embodiment, an ensemble learning prediction apparatus is provided. The apparatus comprises a loss module receiving a sample data and calculating a loss according to a first prediction result of the sample data and an actual result…”
Claims 2-3, 7-8, 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over non-patent literature Jacobs et al. (“Adaptive Mixtures of Local Experts”, hereinafter “Jacobs”) in view of patent application US 2018/0137427 A1 Hsieh et al., hereinafter “Hsieh” further in view of Hastie et al. (“The Elements of Statistical Learning Data Mining, Inference, and Prediction”, hereinafter “Hastie”)
Claim 2
Jacobs in view of Hsieh teaches all the limitations of claim 1, Hastie teaches:
wherein the ensemble learning-type inference model is a boosting learning-type inference model which is constituted by a plurality of inference models formed by sequential learning (Page 338, “The purpose of boosting is to sequentially apply the weak classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers Gm(x), m = 1, 2, . . . , M.”) so that each of the inference models reduces an inference error due to a higher-order inference model group. (Page 338-339, “Each successive classifier is thereby forced to concentrate on those training observations that are missed by previous ones in the sequence.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with boosting model of Hastie. The motivation for doing so would be improve the overall predictive accuracy of the model by combining multiple weak learners to create an accurate and powerful “committee” of classifiers. See page 337 in Hastie that states, “The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.”
Claim 3
Jacobs in view of Hsieh teaches all the limitations of claim 1, Hastie teaches:
wherein the ensemble learning-type inference model is a bagging learning-type inference model (Page 282, “Bootstrap aggregation or bagging averages this prediction over a collection of bootstrap samples”) which performs inference based on each inference result of a plurality of inference models, (Page 282, “For each bootstrap sample Z ∗b , b = 1, 2, . . . , B, we fit our model, giving prediction ˆf ∗b (x).”) each formed by learning based on a plurality of data groups extracted from a same learning target data group. (Page 282, “Bootstrap aggregation or bagging averages this prediction over a collection of bootstrap samples… The bagging estimate is defined by… (equation 8.51)”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the bagging model of Hastie. The motivation for doing so would be to improve the model’s accuracy by reducing its variance. See page 282 in Hastie that states, “Here we show how to use the bootstrap to improve the estimate or prediction itself… Bootstrap aggregation or bagging averages this prediction over a collection of bootstrap samples, thereby reducing its variance.” And Page 283, “Bagging can dramatically reduce the variance of unstable procedures like trees, leading to improved prediction.”
Claim 7
Jacobs in view of Hsieh teaches all the limitations of claim 1, Hastie teaches:
wherein the inference model is a trained decision tree. (Page 356, “The boosted tree model is a sum of such trees…”
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to modify the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh to use decision trees as the individual inference models of Hastie. The motivation for doing so would be to utilize a well-known off-the-shelf learning method that requires minimal data preprocessing. See Page 352 of Hastie, “As a result, scaling and/or more general transformations are not an issue, and they are immune to the effects of predictor outliers. They perform internal feature selection as an integral part of the procedure. They are thereby resistant, if not completely immune, to the inclusion of many irrelevant predictor variables. These properties of decision trees are largely the reason that they have emerged as the most popular learning method…”
Claim 8
Jacobs in view of Hsieh further in view of Hastie teaches all the limitations of claim 7, Hsieh further teaches:
wherein the additional learning processing is processing of… (Para 3, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time” Para 21, “The adaptive ensemble weight can be adapted to the current environment to resolve the concept drifting problem.”
Jacobs in view of Hsieh does not explicitly disclose:
…accumulating the update amount with respect to an inferred output of a decision tree which constitutes each of the inference models.
However, Hastie teaches:
…accumulating the update amount with respect to an inferred output of a decision tree which constitutes each of the inference models. (Page 361, “Algorithm 10.3 Gradient Tree Boosting Algorithm – step (d) “Update fm(x) = fm−1(x) + PJm j=1 γjmI(x ∈ Rjm)” – EN: this denotes the formula showing that the update process is additive. The calculated “update amount” (the regional constants γjm) from the newly trained tree is mathematically summed with the running “inferred output” total of the system (fm-1(x)).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to modify the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh such that the additional learning processing accumulates the update amount with respect to an inferred output of a decision tree of Hastie. The motivation for doing so would be to provide a computationally efficient optimization when updating or training the model. As Hastie explains that simultaneously optimizing all parameters in a combined model requires “computationally intensive numerical optimization techniques.” (Page 342). To solve this, Hastie denotes “forward stagewise additive modeling” (Page 342) approach which simplifies the problem by sequentially accumulating the outputs of new trees. See page 342 in Hastie, “Forward stagewise modeling approximates the solution to (10.4) by sequentially adding new basis functions to the expansion without adjusting the parameters and coefficients of those that have already been added… Previously added terms are not modified.”
Claim 11
Jacobs in view of Hsieh further in view of Hastie teaches all the limitations of claim 2, Hsieh further teaches:
data generator configured to generate (FIG. 2, EN: Fig. 2 illustrates the generation of a prediction “output” from the weighting module of the ensemble learning prediction apparatus)
data generator configured to generate (FIG. 2, EN: Fig. 2 illustrates the generation of a prediction “output” from the weighting module of the ensemble learning prediction apparatus)
data generator configured to generate (FIG. 2, EN: Fig. 2 illustrates the generation of a prediction “output” from the weighting module of the ensemble learning prediction apparatus)
…additional learning processing is processing of… (Para 3, “However, in practical application, as the environment varies with the time, concept drifting phenomenon may occur, and the accuracy of the ensemble learning model created according to historical data will decrease. Under such circumstances, the prediction model must be re-trained or adjusted by use of newly created data to restore the prediction accuracy within a short period of time” Para 21, “The adaptive ensemble weight can be adapted to the current environment to resolve the concept drifting problem.”
Jacobs in view of Hsieh does not explicitly disclose:
wherein the boosting learning-type inference model further includes a first inference model, the first inference model comprising: a first output … first output data by inputting the input data to a first approximate function generated based on training input data and training correct answer data that corresponds to the training input data; a second output … second output data by inputting the input data to a second trained model generated by performing machine learning based on the training input data and difference data between output data generated by inputting the training input data to the first approximate function and the training correct answer data; and a final output … final output data based on the first output data and the second output data, wherein the … updating the second trained model using an update amount based on difference data between the correct answer data and the first output data and the inferred output data.
However, Hastie teaches:
wherein the boosting learning-type inference model further includes a first inference model, the first inference model comprising: a first output … first output data by inputting the input data to a first approximate function generated based on training input data and training correct answer data that corresponds to the training input data; (Page 342, “Forward stagewise modeling approximates the solution to (10.4) by sequentially adding new basis functions to the expansion without adjusting the parameters and coefficients of those that have already been added.” – EN: The “already added” portion of the expansion which is mathematically represented as fm−1(x) in the text acts as the first output or rough approximation that remains unadjusted during the current step.) a second output … second output data by inputting the input data to a second trained model generated by performing machine learning based on the training input data and difference data between output data generated by inputting the training input data to the first approximate function and the training correct answer data; (Page 343, “Thus, for squared-error loss, the term βmb(x; γm) that best fits the current residuals is added to the expansion at each step.” And Page 342 - Algorithm 10.2 Forward Stagewise Additive Modeling, specifically step (b). – EN: The term βmb(x; γm) is the newly trained model (second output) designed to fit the residuals (errors). Algorithm 10.2 shows these two parts are mathematically combined by adding for the final result fm(x).) and a final output … final output data based on the first output data and the second output data, wherein the … updating the second trained model using an update amount based on difference data between the correct answer data and the first output data and the inferred output data. (Page 343, “where rim = yi − fm−1(xi) is simply the residual of the current model on the ith observation.” Page 342, “At each iteration m, one solves for the optimal basis function b(x; γm) and corresponding coefficient βm to add to the current expansion fm−1(x).” – EN: this denotes calculating the difference (residual rim) between the correct answer (yi) and the first output (fm−1(x)), and uses that difference to solve for the new second model.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the forward stagewise additive modeling of Hastie. The motivation for doing so would be to dynamically correct for errors adapt to changing environments (such as concept drift) by training a new model specifically on the residual error s of a fixed baseline model, thereby adapting to new data without disrupting or having to retrain the already-established baseline accuracy. Hastie details this architecture of freezing prior models and fitting new ones to the residuals in page 342 stating that “forward stagewise modeling approximates the solution to (10.4) by sequentially adding new basis functions to the expansion without adjusting the parameters and coefficients of those that have already been added.” Hastie further explains that at each iteration, the new model is fit to the error of the frozen baseline in page 343 stating that “where rim = yi − fm−1(xi) is simply the residual of the current model on the ith observation”.
Claim 12
Jacobs in view of Hsieh further in view of Hastie teaches all the limitations of claim 2, Hastie further teaches:
wherein in the boosting learning-type inference model, only inference models equal to or lower than a predetermined inference model are configured as the first inference model. (Page 361 - Algorithm 10.3 Gradient Tree Boosting Algorithm – EN: this denotes algorithm 10.3 which initializes at step 1 with a basic starting point without the two-part structure. The hybrid two-part structure only begins at position m = 1 and applies to all subsequent models lower in the chain.)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the gradient tree boosting initialization architecture of Hastie. The motivation for doing so would be to provide a structured mathematical approach for to the teaching of Jacobs and Hsieh by using an initialization step as a frozen baseline model that captures the initial global data distribution, allowing the system to handle concept drift by only calculating and updating the residuals in the subsequent model layers. See Algorithm 10.3 on page 361.
Claim 13
Jacobs in view of Hsieh further in view of Hastie teaches all the limitations of claim 11, Hastie further teaches:
wherein the first approximate function is a first trained model generated by performing machine learning (Page 343, “where rim = yi − fm−1(xi) is simply the residual of the current model on the ith observation.” Page 356, “The boosted tree model is a sum of such trees, (equation 10.28) – EN: this denotes calling the accumulated approximation (fm-1) the “current model”, which is made up of a sum of previously trained decision trees.) based on the training input data and the training correct answer data. (Page 338, “each of the training observations (xi , yi), i = 1, 2, . . . , N”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the first approximate function being a trained machine learning model of Hastie. The motivation for doing so would be to ensure the foundational baseline model is accurate and capable of capturing complex patterns in the historical training data, rather than relying on a simplistic formula. Hastie describes building the baseline model out of decision trees Page 356, “The boosted tree model is a sum of such trees…” Hastie further refers to this accumulated sum of trained trees as the established baseline model from which errors are calculated “..where rim = yi − fm−1(xi) is simply the residual of the current model on the ith observation” (page 343).
Claim 14
Jacobs in view of Hsieh further in view of Hastie teaches all the limitations of claim 11, Hastie further teaches:
wherein the first approximate function is a function obtained by formulating a relationship between the training input data and the training correct answer data. (Page 361 - Algorithm 10.3 Gradient Tree Boosting Algorithm, “1. Initialize f0(x) = arg minγ PN i=1 L(yi , γ)” Page 360, “The first line of the algorithm initializes to the optimal constant model, which is just a single terminal node tree.” – EN: this denotes in the very first step of gradient boosting, the base approximation (f0) is not a fully trained sequential machine learning model yet; it is mathematically derived by minimizing the loss over the data (a mathematical formula).
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the first approximate function being a mathematical function between input and answer data of Hastie. The motivation for doing so would be to establish an efficient and mathematically optimal baseline for the system. Hastie describes deriving this initial function not through complex sequential machine learning but through a direct mathematical formula to find the optimal base line, “The first line of the algorithm initializes to the optimal constant model, which is just a single terminal node tree.” (Hastie, page 360).
Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over non-patent literature Jacobs et al. (“Adaptive Mixtures of Local Experts”, hereinafter “Jacobs”) in view of patent application US 2018/0137427 A1 Hsieh et al., hereinafter “Hsieh” further in view of Zhou (“Ensemble Methods Foundations and Algorithms”, hereinafter “Zhou”)
Claim 6
Jacobs in view of Hsieh teaches all the limitations of claim 1, Jacobs further teaches:
wherein the update amount is a value calculated by … multiplying a difference between the inferred output data and the correct answer data by a learning rate by … (Page 80 equation 1.1 and Page 85, “All simulations were performed using a simple gradient descent algorithm with fixed step size t” – EN: this denotes using a fixed step size in a gradient descent algorithm which is synonymous with a learning rate. When a gradient descent algorithm is applied to the error function in equation 1.1, the mathematical derivative extracts the difference term. The algorithm then multiplies this gradient by the step size to calculate the final weight.)
Jacobs in view of Hsieh does not explicitly disclose:
…dividing a value obtained by … the number of inference models that constitute the ensemble learning-type inference model.
However, Zhou teaches:
…dividing a value obtained by … the number of inference models that constitute the ensemble learning-type inference model. (Zhou Page 68, “Suppose we are given a set of T individual learners {hi,..., hr} … Specifically, simple averaging gives the combined output H(x) as (equation 4.1)”
PNG
media_image3.png
76
885
media_image3.png
Greyscale
Page 76, “If all the individual classifiers are treated equally, the simple soft voting method generates the combined output by simply averaging all the individual outputs, and the final output for class c; is given by (equation 4.23)” – EN: this denotes the concept of equally distributing weight among models using “simple averaging” by multiplying by (1/T) which is equivalent to dividing by T – number of models. )
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the simple averaging technique of Zhou. The motivation for doing so would be to reduce the variance of the predictions, thereby preventing the ensemble model from overfitting to the training data. See Page 71 of Zhou, “In particular, with a large ensemble, there are a lot of weights to learn, and this can easily lead to overfitting; simple averaging does not have to learn any weights, and so suffers little from overfitting. In general, it is widely accepted that simple averaging is appropriate for combining learners with similar performances, whereas if the individual learners exhibit nonidentical strength, weighted averaging with unequal weights may achieve a better performance.”
Claims 15 is rejected under 35 U.S.C. 103 as being unpatentable over non-patent literature Jacobs et al. (“Adaptive Mixtures of Local Experts”, hereinafter “Jacobs”) in view of patent application US 2018/0137427 A1 Hsieh et al., hereinafter “Hsieh” further in view of Japanese patent application JP2019057016A Nishiyama et al., hereinafter “Nishiyama”.
Claim 15
Jacobs in view of Hsieh teaches all the limitations of claim 1, Nishiyama teaches:
further comprising a conversion processor configured to convert, when the correct answer data is a label, the label into a numerical value. (Para 37, “Furthermore, the conversion unit 15c converts labels indicating whether the learning data is benign or malignant into numerical labels. For example, the label is expressed as a number, with 0 representing a benign label and 1 representing a malignant label. FIG. 7 illustrates an example of feature vectors converted from feature quantities and numerical labels converted from labels.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the conversion unit that converts labels into numerical value of Nishiyama. The motivation for doing so would be to allow the machine learning model to calculate a continuous numerical score for categorical data. Nishiyama states that by converting text labels into numbers, the system can calculate a numerical degree or score for the labels, Para 47 (Nishiyama), “creates a model indicating the degree of benignity or malignancy of the communication log as the classifier” Para 43 (Nishiyama), “determines that the communication log is benign or malignant if the score indicating the degree of benignity or malignancy of the communication log output by the classifier 14a is higher than a predetermined threshold.”
Claims 20 is rejected under 35 U.S.C. 103 as being unpatentable over non-patent literature Jacobs et al. (“Adaptive Mixtures of Local Experts”, hereinafter “Jacobs”) in view of patent application US 2018/0137427 A1 Hsieh et al., hereinafter “Hsieh” further in view of patent application US 2022/0056953 A1 Fujimoto et al., hereinafter “Fujimoto”.
Claim 20
Fujimoto teaches:
A control apparatus for controlling a target apparatus, the control apparatus… (Para 89, “In the present embodiment, the control unit 40 controls the position of the shaft 115 by using a machine learning technique.” Para 81, “the control unit 40 is constituted by a microcomputer and a memory device or the like that stores software for causing the microcomputer to operate.”) …acquire input data and correct answer data that corresponds to the input data from the target apparatus; (Para 94, “the control-target device 50 is at least one of the shaft 115 , the magnetic bearings 21 and 22 , and the displacement sensors 31 and 32” Para 96, “The state variable acquisition unit 43 observes the state of the magnetic bearing device 10 while the magnetic bearing device 10 is in operation and acquires information regarding the observed state as a state variable… the state variable is the output values of the displacement sensors 31 and 32” Para 99, “The evaluation data is used as training data in supervised learning.” Para 119, “The learning data is a set of pairs of input data and training data corresponding to the input data.” – EN: this denotes acquiring state variables (input data) from the sensors of the control-target device (target apparatus) and pairing them with corresponding training data (correct answer data) to form a learning dataset.)
The remaining limitations of claim 20 are substantially the same as claim 1, therefore claim 20 is rejected under the same rationale as claim 1.
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in
the art to combine the combine the ensemble learning network of Jacobs and the hardware and additional learning of Hsieh with the control apparatus for controlling a target apparatus and acquiring data from said target apparatus of Fujimoto. The motivation for doing so would be to dynamically adapt the inference models to real world physical changes such as mechanical wear or environmental shifts in order to maintain long-term operational stability. Fujimoto states that without adapting to the target apparatus data, “appropriate voltage command values are not obtained because of a device-to-device quality variation, a temporal change of the system, and the like. Consequently, the stability of the control… may decrease or the shaft may touch a touchdown bearing” whereas applying this continuous learning allows the system to “maintain the stability of control of the position of the shaft 115 for a long period” (Paras. 187-188).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAYMUR RAHMAN ALI whose telephone number is (571)272-0007. The examiner can normally be reached Mon-Fri. 9:30-6:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NAYMUR RAHMAN ALI/
Examiner, Art Unit 2123
/ALEXEY SHMATOV/
Supervisory Patent Examiner, Art Unit 2123