Office Action Analysis: 18321982 — MACHINE LEARNING FORECASTING BASED ON RESIDUAL PREDICTIONS

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The analysis of the claims will follow the 2019 Revised Patent Subject Matter Eligibility Guidelines (“2019 PEG”).
	Step 1: Independent claims 1 (A system comprising…), 11(A computer-implemented method, comprising…), and 20(A non-transitory, computer-readable medium storing computer-readable instructions, that upon execution by at least one hardware processor, cause performance of operations, comprising…) are directed towards a system, a method, and a manufacture respectively. Therefore, these claims, as well as their dependent claims, are directed towards one of the four statutory categories (process, machine (i.e. system), manufacture, or composition of matter).
	Claim 1
	Step 2A, Prong 1: The claim recites, inter alia:
combining the first prediction and the second prediction to generate a combined prediction; 
This limitation is a mental process using observation, evaluation, judgment, and opinion with aid of pen and paper in combining predictions. See MPEP 2106.04(a)(2)(III); 
generating one or more action recommendations based on the combined prediction.  
This limitation is a mental process using evaluation and judgment with aid of pen and paper in thinking of a recommendation using a combined prediction. See MPEP 2106.04(a)(2)(III).
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
at least one memory storing instructions;
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
a network interface;
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
at least one hardware processor interoperably coupled with the network interface and the at least one memory, wherein execution of the instructions by the at least one hardware processor causes performance of operations comprising:
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; 
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction; 
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
at least one memory storing instructions;
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
a network interface;
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
at least one hardware processor interoperably coupled with the network interface and the at least one memory, wherein execution of the instructions by the at least one hardware processor causes performance of operations comprising:
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; 
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction; 
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
Claim 2
Step 2A, Prong 1: No additional abstract idea limitations.
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
obtaining one or more past residuals of the first machine learning model;
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
training, using the one or more past residuals, the second machine learning model.
This limitation represents an insignificant extra-solution activity of training a generic machine learning model. See Recentive v. Fox.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
obtaining one or more past residuals of the first machine learning model;
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
training, using the one or more past residuals, the second machine learning model.
RECENTIVE ANALYTICS, INC., v. FOX CORP., FOX BROADCASTING COMPANY, LLC, FOX SPORTS PRODUCTIONS, LLC, indicates that training a machine learning model is a well-understood, routine, and conventional function (pages 8-9) when it is claimed in a generic manner (as it is in the present claim)
Claim 3
Step 2A, Prong 1: The claim recites, inter alia:
…the at least a part of the first prediction is associated with a third time period, and the first time period comprises the third time period;
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to determine that a prediction is associated with a certain time period and determining that the certain time period is at least a part of another time period. See MPEP 2106.04(a)(2)(III)
subtracting the at least a part of the first prediction from the actual observation value to generate a past residual.
This limitation recites a mathematical concept to subtract two one value from another value. See MPEP 2106.04(a)(2)(I)
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
obtaining the at least a part of the first prediction
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
obtaining an actual observation value of the variable for the third time period;
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
obtaining the at least a part of the first prediction
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
obtaining an actual observation value of the variable for the third time period.
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
Claim 4
Step 2A, Prong 1: The claim recites, inter alia:
…the first time period is relatively longer than the second time period
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to determine that a certain time period is longer than another time period.
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Claim 5
Step 2A, Prong 1: The claim recites, inter alia:
adding the second prediction to the at least a part of first prediction.
This limitation recites a mathematical concept to add two values together. See MPEP 2106.05(a)(2)(I).
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Claim 6
Step 2A, Prong 1: The claim recites, inter alia:
determining one or more Shapley values associated with the combined prediction;
This limitation recites a mathematical concept to calculate values of a specific type. See MPEP 2106.05(a)(2)(I). 
determining that the one or more Shapley values satisfy one or more conditions.
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to check calculations against conditions.
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Step 7
Step 2A, Prong 1: The claim recites, inter alia:
in response to determining that the one or more Shapley values satisfy the one or more conditions, adding one or more actions to the one or more action recommendations.
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to think of a recommendation and add to a group of other recommendations when a certain calculation meets a certain condition
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Step 8
Step 2A, Prong 1: The claim recites, inter alia:
… a condition that a sum of Shapley values associated with the first machine learning model are less than a predetermined ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model
This limitation recites a mathematical concept to calculate the sum of values and a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to check the calculation against a ratio.
Step 2A, Prong 2: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… the one or more actions comprise retraining the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… the one or more actions comprise retraining the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Step 9
Step 2A, Prong 1: The claim recites, inter alia:
in response to determining that the sum of Shapley values associated with the first machine learning model are less than the predetermined ratio of the total sum of the Shapley values associated with the first machine learning model and the Shapley values associated with the second machine learning model…
This limitation recites a mathematical concept to calculate the sum of values and a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to check the calculation against a ratio.
Step 2A, Prong 2: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… automatically triggering retraining of the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea based on a condition. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… automatically triggering retraining of the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea based on a condition. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Claim 10
Step 2A, Prong 1: The claim recites, inter alia:
… the first machine learning model and the second machine learning model have at least one different feature
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Claim 11
	Step 2A, Prong 1: The claim recites, inter alia:
combining the first prediction and the second prediction to generate a combined prediction; 
This limitation is a mental process using observation, evaluation, judgment, and opinion with aid of pen and paper in combining predictions. See MPEP 2106.04(a)(2)(III); 
generating one or more action recommendations based on the combined prediction.  
This limitation is a mental process using evaluation and judgment with aid of pen and paper in thinking of a recommendation using a combined prediction. See MPEP 2106.04(a)(2)(III).
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; 
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction; 
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; 
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction; 
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
Claim 12
Step 2A, Prong 1: No additional abstract idea limitations.
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
obtaining one or more past residuals of the first machine learning model;
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
training, using the one or more past residuals, the second machine learning model.
This limitation represents an insignificant extra-solution activity of training a generic machine learning model. See Recentive v. Fox.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
obtaining one or more past residuals of the first machine learning model;
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
training, using the one or more past residuals, the second machine learning model.
RECENTIVE ANALYTICS, INC., v. FOX CORP., FOX BROADCASTING COMPANY, LLC, FOX SPORTS PRODUCTIONS, LLC, indicates that training a machine learning model is a well-understood, routine, and conventional function (pages 8-9) when it is claimed in a generic manner (as it is in the present claim)
Claim 13
Step 2A, Prong 1: The claim recites, inter alia:
…the at least a part of the first prediction is associated with a third time period, and the first time period comprises the third time period;
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to determine that a prediction is associated with a certain time period and determining that the certain time period is at least a part of another time period. See MPEP 2106.04(a)(2)(III)
subtracting the at least a part of the first prediction from the actual observation value to generate a past residual.
This limitation recites a mathematical concept to subtract two one value from another value. See MPEP 2106.04(a)(2)(I)
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
obtaining the at least a part of the first prediction
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
obtaining an actual observation value of the variable for the third time period;
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
obtaining the at least a part of the first prediction
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
obtaining an actual observation value of the variable for the third time period.
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
Claim 14
Step 2A, Prong 1: The claim recites, inter alia:
…the first time period is relatively longer than the second time period
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to determine that a certain time period is longer than another time period.
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Claim 15
Step 2A, Prong 1: The claim recites, inter alia:
adding the second prediction to the at least a part of first prediction.
This limitation recites a mathematical concept to add two values together. See MPEP 2106.05(a)(2)(I).
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Claim 16
Step 2A, Prong 1: The claim recites, inter alia:
determining one or more Shapley values associated with the combined prediction;
This limitation recites a mathematical concept to calculate values of a specific type. See MPEP 2106.05(a)(2)(I). 
determining that the one or more Shapley values satisfy one or more conditions.
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to check calculations against conditions.
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Step 17
Step 2A, Prong 1: The claim recites, inter alia:
in response to determining that the one or more Shapley values satisfy the one or more conditions, adding one or more actions to the one or more action recommendations.
This limitation recites a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to think of a recommendation and add to a group of other recommendations when a certain calculation meets a certain condition
Step 2A, Prong 2 & Step 2B: There are no additional elements recited in this claim so the claim does not provide a practical application and is not considered to be significantly more.
Step 18
Step 2A, Prong 1: The claim recites, inter alia:
… a condition that a sum of Shapley values associated with the first machine learning model are less than a predetermined ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model
This limitation recites a mathematical concept to calculate the sum of values and a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to check the calculation against a ratio.
Step 2A, Prong 2: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… the one or more actions comprise retraining the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… the one or more actions comprise retraining the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Step 19
Step 2A, Prong 1: The claim recites, inter alia:
in response to determining that the sum of Shapley values associated with the first machine learning model are less than the predetermined ratio of the total sum of the Shapley values associated with the first machine learning model and the Shapley values associated with the second machine learning model…
This limitation recites a mathematical concept to calculate the sum of values and a mental process using observation, evaluation, judgment and opinion with aid of pen and paper to check the calculation against a ratio.
Step 2A, Prong 2: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… automatically triggering retraining of the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea based on a condition. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
… automatically triggering retraining of the first machine learning model.
This limitation is recited at a high level of generality and recites use of a general classes of computer algorithms to perform the abstract idea based on a condition. Mere recitation that a judicial exception is to be performed using generic computer equipment running general class of computer algorithms in an ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f); 
Claim 20
	Step 2A, Prong 1: The claim recites, inter alia:
combining the first prediction and the second prediction to generate a combined prediction; 
This limitation is a mental process using observation, evaluation, judgment, and opinion with aid of pen and paper in combining predictions. See MPEP 2106.04(a)(2)(III); 
generating one or more action recommendations based on the combined prediction.  
This limitation is a mental process using evaluation and judgment with aid of pen and paper in thinking of a recommendation using a combined prediction. See MPEP 2106.04(a)(2)(III).
Step 2A, Prong 2: The additional elements recited in the claim do not integrate the judicial exception into a practical application. 
Additional elements:
A non-transitory, computer-readable medium storing computer-readable instructions, that upon execution by at least one hardware processor, cause performance of operations
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; 
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction; 
This limitation represents an insignificant extra-solution activity of data gathering, being pre-solution activity, performed by a generic machine learning model. See MPEP 2106.05(g);
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Additional elements:
A non-transitory, computer-readable medium storing computer-readable instructions, that upon execution by at least one hardware processor, cause performance of operations
This limitation is recited at a high level of generality and recites use of generic computer equipment to perform the abstract idea. Mere recitation that a judicial exception is to be performed using generic computer equipment in their ordinary capacity, cannot meaningfully integrate the judicial exception into a practical application. See MPEP 2106.05(f);
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; 
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction; 
MPEP 2106.05(d)(II)(iv) indicates that merely storing and retrieving information in memory is a well-understood, routine, and conventional function when it is claimed in a merely generic manner (as it is in the present claim);
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-5,10-15, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over MULTI-STEP TIME SERIES FORECASTING WITH RESIDUAL LEARNING (US 20190188611 A1) by Wu et al., hereafter Wu, in view of SYSTEM, METHOD AND COMPUTER PROGRAM FOR FORECASTING A TREND OF A NUMERICAL VALUE OVER A TIME INTERVAL (US 20230061911 A1) by Elshocht et al., hereafter Elshocht, and in further view of OPTIMIZING GENERATION OF A FORECAST (US 20210089944 A1) by Zhou et al., hereafter Zhou.
Regarding claim 1, Wu teaches:
at least one memory storing instructions;
a network interface; and 
	at least one hardware processor interoperably coupled with the network interface and the at least one memory, wherein execution of the instructions by the at least one hardware processor causes performance of operations comprising: 
((Wu) Paragraph [0087] "Apparatus 1200 includes processor 1210 operatively coupled to communication device 1220, data storage device 1230, one or more input devices 1240, one or more output devices 1250, and memory 1260." A communication device is a network interface.)
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; ((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." A first regression model is a first machine learning model and to obtain predicted values of a current future time point is obtaining a first prediction predicting a variable for a first time period.)
combining the first prediction and the second prediction to generate a combined prediction; ((Wu) Paragraph [0026] "Joiner 230 is a mechanism that combines the forecasted results (e.g., outputs) from local prediction module 220" (Wu) Fig. 2, local prediction module 220 contains multiple forecasting models. Combining forecasted results is combining a first prediction and a second prediction to generate a combined prediction.) Wu additionally suggests obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model, and wherein the second machine learning model is trained based on at least a part of the first prediction. ((Wu) Paragraph [0044] " A second regression model (e.g., residual regression model) is then built at 418, using the original input variables from 402, 404 and the predicted time series values from 414 as input variables and the actual residual value from 416 as a new target variable." A second regression model is a second machine learning model. Using the second regression model predicted time series values as input variables and the first regression model actual residual value as a target variable is training it on at least a part of the first prediction, as those values are results of and based on the first prediction. The second regression model using the actual residual value as a target variable also means that it is obtaining a prediction of a residual of the first regression model (first machine learning model).)
Wu does not explicitly disclose, but with Elshocht does teach:
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction;  (Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The first sub-interval of Elshocht used to train a second machine-learning model is equivalent to the second time period. The first sub-interval is a result of dividing the time interval so the time interval (first time period) comprises the first sub-interval (second time period). The second sub-interval is a third time period that the first time period comprises and a part of the first prediction is associated with.)
Wu and Elshocht are in the same analogous art of machine learning models that predict based on information that is gathered into time periods. In addition, Elshocht teaches that smaller sub-intervals enable a quick evaluation of a quality of the forecast and avoids accumulation of forecasting errors over a longer period of time, but larger intervals are often more precise, so it is important to use both types of time periods. ((Elshocht) Paragraph [0005], "...repeating the forecast for smaller and smaller time intervals until a desired time-granularity is reached, providing a concept for forecasting a trend of a numerical value which enables a quick evaluation of a quality of the forecast, while avoiding the accumulation of forecasting errors over a longer period of time...On the other hand, the forecasts for the smaller and smaller time intervals may use the forecasting result for the longer time intervals. This may save time and may yield more precise results, as forecasts for longer time intervals are often more precise than individual forecasts over shorter time intervals")
Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have included dividing the first time point (first time period) of Wu into a second sub-interval forming a second time period that the first comprises, as Elshocht teaches, in order to avoid the accumulation of forecasting errors that occur over a longer period of time.  
Wu, in view of Elshocht, still does not teach:
generating one or more action recommendations based on the combined prediction. 
However, Zhao teaches:
Generating one or more action recommendations based on a prediction. ((Zhao) Paragraph [0043] "As shown in FIG. 1E and by reference number 126, the forecast analysis platform may perform one or more actions (e.g., based on generating the additional forecast)." The forecast analysis platform may perform one or more actions in interpreted as generating one or more action recommendations. An additional forecast is interpreted as the equivalent to the combined prediction.) 
Zhao, Wu and Elshocht are in the same analogous art of machine learning models that predict based on information gathered into time periods. In addition, Zhao teaches that generating action recommendations reduces the need for monitoring. ((Zhao) Paragraph [0013], "Moreover, the forecast analysis platform may perform an action, such as automatically scheduling transactions, which may reduce a need for a manager to monitor the transaction account using the one or more devices.")
Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have added the action recommendation based on the combined prediction from Wu, in view of Elshocht, with the rest of the invention disclosed by Wu, in view of Elshocht, in order reduce to need to monitor. This combination would produce the predictable result of the system disclosed in claim 1 of the instant application.
Regarding claim 2, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 1 and Wu additionally teaches:
obtaining one or more past residuals of the first machine learning model; and ((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." Calculating residual values is obtaining one or more past residuals.)
training, using the one or more past residuals, the second machine learning model. ((Wu) Paragraph [0044] "A second regression model (e.g., residual regression model) is then built at 418, using the original input variables from 402, 404 and the predicted time series values from 414 as input variables and the actual residual value from 416 as a new target variable." Building the second regression model with actual residual value as a target variable is training the second learning model using one or more past residuals.)
Regarding claim 3, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 2 and Wu with Elshocht additionally teach:
	obtaining the at least a part of the first prediction, wherein the at least a part of the first prediction is associated with a third time period, and the first time period comprises the third time period; (Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The second sub-interval is a third time period that the first time period comprises and a part of the first prediction is associated with.)
	obtaining an actual observation value of the variable for the third time period; and ((Wu) Paragraph [0040] "After the time series information is gathered, actual values of future time points are extracted as target variables in training data at 408." Actual values of future time points are actual observation values of the variable for the third time period and are also actual/target values.)
	subtracting the at least a part of the first prediction from the actual observation value to generate a past residual.  
((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." Obtaining predicted values of the current future time point is obtaining the at least a part of the first prediction and the predicted values are the at least a part of the first prediction. Calculating a residual value is generating a past residual.)
The rationale to combine Wu and Elshocht is the same as provided in the parent claim.
Regarding claim 4, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 3 and Elshocht additionally teaches:
	the first time period is relatively longer than the second time period. ((Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The first sub-interval is a second time period. The first sub-interval is a result of dividing the time interval so the time interval (first time period) is relatively longer than the first sub-interval (the second time period).)
The rationale to combine Wu and Elshocht is the same as provided in the parent claim.
Regarding claim 5, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 1 and Wu additionally teaches:
	adding the second prediction to the at least a part of first prediction. ((Wu) Paragraph [0047] "For a current future time point, regression model A is first applied at 504 to predict the time series values of the current future time point. The original input variables used in the forecasting regression model and the predicted values are combined at 506." Paragraph [0048] "Next, at 508, based on the predicted values, residual regression model B is applied, where the residual value (e.g., predicted error) is predicted and obtained at 510. The final predicted value (e.g., actual final prediction) is calculated at 512 by adding the predicted residual value to the predicted time series value." The predicted residual value is the second prediction. The predicted time series value is the at least a part of first prediction.)
Regarding claim 10, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 1 and Wu additionally teaches:
	the first machine learning model and the second machine learning model have at least one different feature. ((Wu) Fig. 4, a second model uses predicted values of a first model as an input feature, which is a feature that could not have existed in the first model.)		
Regarding claim 11, Wu teaches:
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; ((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." A first regression model is a first machine learning model and to obtain predicted values of a current future time point is obtaining a first prediction predicting a variable for a first time period.)
combining the first prediction and the second prediction to generate a combined prediction; ((Wu) Paragraph [0026] "Joiner 230 is a mechanism that combines the forecasted results (e.g., outputs) from local prediction module 220" (Wu) Fig. 2, local prediction module 220 contains multiple forecasting models. Combining forecasted results is combining a first prediction and a second prediction to generate a combined prediction.). Wu additionally suggests obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model, and wherein the second machine learning model is trained based on at least a part of the first prediction. ((Wu) Paragraph [0044] " A second regression model (e.g., residual regression model) is then built at 418, using the original input variables from 402, 404 and the predicted time series values from 414 as input variables and the actual residual value from 416 as a new target variable." A second regression model is a second machine learning model. Using the second regression model predicted time series values as input variables and the first regression model actual residual value as a target variable is training it on at least a part of the first prediction, as those values are results of and based on the first prediction. The second regression model using the actual residual value as a target variable also means that it is obtaining a prediction of a residual of the first regression model (first machine learning model).)
Wu does not explicitly disclose, but with Elshocht does teach:
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction;  (Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The first sub-interval of Elshocht used to train a second machine-learning model is equivalent to the second time period. The first sub-interval is a result of dividing the time interval so the time interval (first time period) comprises the first sub-interval (second time period). The second sub-interval is a third time period that the first time period comprises and a part of the first prediction is associated with.)
Wu and Elshocht are in the same analogous art of machine learning models that predict based on information that is gathered into time periods. In addition, Elshocht teaches that smaller sub-intervals enable a quick evaluation of a quality of the forecast and avoids accumulation of forecasting errors over a longer period of time, but larger intervals are often more precise, so it is important to use both types of time periods. ((Elshocht) Paragraph [0005], "...repeating the forecast for smaller and smaller time intervals until a desired time-granularity is reached, providing a concept for forecasting a trend of a numerical value which enables a quick evaluation of a quality of the forecast, while avoiding the accumulation of forecasting errors over a longer period of time...On the other hand, the forecasts for the smaller and smaller time intervals may use the forecasting result for the longer time intervals. This may save time and may yield more precise results, as forecasts for longer time intervals are often more precise than individual forecasts over shorter time intervals")
Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have included dividing the first time point (first time period) of Wu into a second sub-interval forming a second time period that the first comprises, as Elshocht teaches, for use in the second machine learning model in order to avoid the accumulation of forecasting errors that occur over a longer period of time.  
Wu, in view of Elshocht, still does not teach:
generating one or more action recommendations based on the combined prediction. 
However, Zhao teaches:
Generating one or more action recommendations based on a prediction. ((Zhao) Paragraph [0043] "As shown in FIG. 1E and by reference number 126, the forecast analysis platform may perform one or more actions (e.g., based on generating the additional forecast)." The forecast analysis platform may perform one or more actions in interpreted as generating one or more action recommendations. An additional forecast is interpreted as the equivalent to the combined prediction.) 
Zhao, Wu and Elshocht are in the same analogous art of machine learning models that predict based on information gathered into time periods. In addition, Zhao teaches that generating action recommendations reduces the need for monitoring. ((Zhao) Paragraph [0013], "Moreover, the forecast analysis platform may perform an action, such as automatically scheduling transactions, which may reduce a need for a manager to monitor the transaction account using the one or more devices.")
Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have added the action recommendation based on the combined prediction from Wu, in view of Elshocht, with the rest of the invention disclosed by Wu, in view of Elshocht, in order reduce to need to monitor. This combination would produce the predictable result of the system disclosed in claim 11 of the instant application.
Regarding claim 12, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 11 and Wu additionally teaches:
obtaining one or more past residuals of the first machine learning model; and ((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." Calculating residual values is obtaining one or more past residuals.)
training, using the one or more past residuals, the second machine learning model. ((Wu) Paragraph [0044] "A second regression model (e.g., residual regression model) is then built at 418, using the original input variables from 402, 404 and the predicted time series values from 414 as input variables and the actual residual value from 416 as a new target variable." Building the second regression model with actual residual value as a target variable is training the second learning model using one or more past residuals.)
Regarding claim 13, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 12 and Wu with Elshocht additionally teach:
	obtaining the at least a part of the first prediction, wherein the at least a part of the first prediction is associated with a third time period, and the first time period comprises the third time period; (Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The second sub-interval is a third time period that the first time period comprises and a part of the first prediction is associated with.)
	obtaining an actual observation value of the variable for the third time period; and ((Wu) Paragraph [0040] "After the time series information is gathered, actual values of future time points are extracted as target variables in training data at 408." Actual values of future time points are actual observation values of the variable for the third time period and are also actual/target values.)
	subtracting the at least a part of the first prediction from the actual observation value to generate a past residual.  
((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." Obtaining predicted values of the current future time point is obtaining the at least a part of the first prediction and the predicted values are the at least a part of the first prediction. Calculating a residual value is generating a past residual.)
The rationale to combine Wu and Elshocht is the same as provided in the parent claim.
Regarding claim 14, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 13 and Elshocht additionally teaches:
	the first time period is relatively longer than the second time period. ((Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The first sub-interval is a second time period. The first sub-interval is a result of dividing the time interval so the time interval (first time period) is relatively longer than the first sub-interval (the second time period).)
The rationale to combine Wu and Elshocht is the same as provided in the parent claim.
Regarding claim 15, Wu, in view of Elshocht and Zhao, teaches all of the material disclosed in claim 14 and Wu additionally teaches:
	adding the second prediction to the at least a part of first prediction. ((Wu) Paragraph [0047] "For a current future time point, regression model A is first applied at 504 to predict the time series values of the current future time point. The original input variables used in the forecasting regression model and the predicted values are combined at 506." Paragraph [0048] "Next, at 508, based on the predicted values, residual regression model B is applied, where the residual value (e.g., predicted error) is predicted and obtained at 510. The final predicted value (e.g., actual final prediction) is calculated at 512 by adding the predicted residual value to the predicted time series value." The predicted residual value is the second prediction. The predicted time series value is the at least a part of first prediction.)
Regarding claim 1, Wu teaches:
A non-transitory, computer-readable medium storing computer-readable instructions, that upon execution by at least one hardware processor, cause performance of operations, comprising: 
((Wu) Paragraph [0087] "Apparatus 1200 includes processor 1210 operatively coupled to communication device 1220, data storage device 1230, one or more input devices 1240, one or more output devices 1250, and memory 1260." Memory is a non-transitory, computer-readable medium.) 
obtaining, using a first machine learning model, a first prediction predicting a variable for a first time period; ((Wu) Paragraph [0043] "Once the first regression model (e.g., forecasting regression model) is built for the current future time point, a stabilizing mechanism is used to improve accuracy. More specifically, the first regression model from 412 is applied to the training data at 414 to obtain predicted values of the current future time point. Residual values are then calculated at 416 by subtracting the predicted values from the actual/target values." A first regression model is a first machine learning model and to obtain predicted values of a current future time point is obtaining a first prediction predicting a variable for a first time period.)
combining the first prediction and the second prediction to generate a combined prediction; ((Wu) Paragraph [0026] "Joiner 230 is a mechanism that combines the forecasted results (e.g., outputs) from local prediction module 220" (Wu) Fig. 2, local prediction module 220 contains multiple forecasting models. Combining forecasted results is combining a first prediction and a second prediction to generate a combined prediction.) Wu additionally suggests obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model, and wherein the second machine learning model is trained based on at least a part of the first prediction. ((Wu) Paragraph [0044] " A second regression model (e.g., residual regression model) is then built at 418, using the original input variables from 402, 404 and the predicted time series values from 414 as input variables and the actual residual value from 416 as a new target variable." A second regression model is a second machine learning model. Using the second regression model predicted time series values as input variables and the first regression model actual residual value as a target variable is training it on at least a part of the first prediction, as those values are results of and based on the first prediction. The second regression model using the actual residual value as a target variable also means that it is obtaining a prediction of a residual of the first regression model (first machine learning model).)
Wu does not explicitly disclose, but with Elshocht does teach:
obtaining, using a second machine learning model, a second prediction predicting a residual of the first machine learning model for a second time period, wherein the first time period comprises the second time period, and wherein the second machine learning model is trained based on at least a part of the first prediction;  (Elshocht) Paragraph [0036] "The processing circuitry is configured to determine an estimate of the numerical value for the time interval by training a first machine-learning model based on historical data on the numerical value. The processing circuitry is configured to divide the time interval into a first and a second sub-interval. The processing circuitry is configured to determine an estimate of the numerical value for the first sub-interval by training a second machine-learning model based on the historical data on the numerical value." The time interval of Elshocht is equivalent to the first time period of the instant application. The first sub-interval of Elshocht used to train a second machine-learning model is equivalent to the second time period. The first sub-interval is a result of dividing the time interval so the time interval (first time period) comprises the first sub-interval (second time period). The second sub-interval is a third time period that the first time period comprises and a part of the first prediction is associated with.)
Wu and Elshocht are in the same analogous art of machine learning models that predict based on information that is gathered into time periods. In addition, Elshocht teaches that smaller sub-intervals enable a quick evaluation of a quality of the forecast and avoids accumulation of forecasting errors over a longer period of time, but larger intervals are often more precise, so it is important to use both types of time periods. ((Elshocht) Paragraph [0005], "...repeating the forecast for smaller and smaller time intervals until a desired time-granularity is reached, providing a concept for forecasting a trend of a numerical value which enables a quick evaluation of a quality of the forecast, while avoiding the accumulation of forecasting errors over a longer period of time...On the other hand, the forecasts for the smaller and smaller time intervals may use the forecasting result for the longer time intervals. This may save time and may yield more precise results, as forecasts for longer time intervals are often more precise than individual forecasts over shorter time intervals")
Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have included dividing the first time point (first time period) of Wu into a second sub-interval forming a second time period that the first comprises, as Elshocht teaches, in order to avoid the accumulation of forecasting errors that occur over a longer period of time.  
Wu, in view of Elshocht, still does not teach:
generating one or more action recommendations based on the combined prediction. 
However, Zhao teaches:
Generating one or more action recommendations based on a prediction. ((Zhao) Paragraph [0043] "As shown in FIG. 1E and by reference number 126, the forecast analysis platform may perform one or more actions (e.g., based on generating the additional forecast)." The forecast analysis platform may perform one or more actions in interpreted as generating one or more action recommendations. An additional forecast is interpreted as the equivalent to the combined prediction.) 
Zhao, Wu and Elshocht are in the same analogous art of machine learning models that predict based on information gathered into time periods. In addition, Zhao teaches that generating action recommendations reduces the need for monitoring. ((Zhao) Paragraph [0013], "Moreover, the forecast analysis platform may perform an action, such as automatically scheduling transactions, which may reduce a need for a manager to monitor the transaction account using the one or more devices.")
Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have added the action recommendation based on the combined prediction from Wu, in view of Elshocht, with the rest of the invention disclosed by Wu, in view of Elshocht, in order reduce to need to monitor. This combination would produce the predictable result of the system disclosed in claim 20 of the instant application.
Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Elshocht and Zhou, and in further view of MODEL-AGNOSTIC APPROACH TO INTERPRETING SEQUENCE PREDICTIONS (US 20220114494 A1) by Sousa et al., hereafter Sousa,.
Regarding claim 6, Wu, in view of Elshocht and Zhou, teaches all of the material disclosed in claim 1.
Wu, in view of Elshocht and Zhou, does not expressly disclose, but with Sousa does teach: 
	determining one or more Shapley values associated with the combined prediction; and ((Sousa) Paragraph [0036], "At 310, a relevance metric is determined for the selected input based at least in part on the plurality of perturbed prediction outputs of the machine learning model. In various embodiments, the relevance metric is a Shapley value." In this embodiment, the relevance metric is a Shapely value. Being based at least in part on the plurality of perturbed prediction outputs of the machine learning model is equivalent to being associated with the combined prediction.)
	determining that the one or more Shapley values satisfy one or more conditions. ((Sousa) Paragraph [0041], "At 408, it is determined whether the relevance metric falls below a specified threshold. The specified threshold may take the form of a specific importance value that is empirically decided. The specified threshold may also take the form of a ratio of an importance value associated with the second sub-sequence to an importance value associated with the overall sequence of predictions (e.g., a ratio of Shapley values)." Determining whether the relevance metric falls below a specified threshold is determining that the one or more Shapley values satisfy a condition.)
	Sousa is the analogous art as Wu, Elshocht, and Zhou as all are in the same category of machine learning model predictions. In addition, Sousa teaches that using Shapley values in model interpretability ensures consistency, nullifies attribution from missing inputs, and ensures that the sum of each individual input attribution value is not different from the actual model reward value. ((Sousa) Paragraph [0022] "An advantage of bringing the Shapley values framework into model interpretability is inheriting Shapley properties for model explanations, these being: local accuracy ensuring that the sum of all individual input attribution values is equal to the model's score; missingness dictating that missing inputs should have no impact on the model's score, and therefore their attribution must be null; and consistency ensuring that if an input's contribution to the model increases, then its attributed importance should not decrease.")
	Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have included Shapley values as a relevance metric and to use the relevance metric to check if the invention meets a condition, as in Sousa, into the invention of Wu, in view of Elshocht and Zhou, in order to take advantage of Shapley value properties of ensuring consistency, nullifying attribution from missing inputs, and ensuring that the sum of each individual input attribution value is not different from the actual model reward value. This would product the predictable result that is the invention claimed in claim 6 of this application.
Regarding claim 16, Wu, in view of Elshocht and Zhou, teaches all of the material disclosed in claim 11.
Wu, in view of Elshocht and Zhou, does not expressly disclose, but with Sousa does teach: 
	determining one or more Shapley values associated with the combined prediction; and ((Sousa) Paragraph [0036], "At 310, a relevance metric is determined for the selected input based at least in part on the plurality of perturbed prediction outputs of the machine learning model. In various embodiments, the relevance metric is a Shapley value." In this embodiment, the relevance metric is a Shapely value. Being based at least in part on the plurality of perturbed prediction outputs of the machine learning model is equivalent to being associated with the combined prediction.)
	determining that the one or more Shapley values satisfy one or more conditions. ((Sousa) Paragraph [0041], "At 408, it is determined whether the relevance metric falls below a specified threshold. The specified threshold may take the form of a specific importance value that is empirically decided. The specified threshold may also take the form of a ratio of an importance value associated with the second sub-sequence to an importance value associated with the overall sequence of predictions (e.g., a ratio of Shapley values)." Determining whether the relevance metric falls below a specified threshold is determining that the one or more Shapley values satisfy a condition.)
	Sousa is the analogous art as Wu, Elshocht, and Zhou as all are in the same category of machine learning model predictions. In addition, Sousa teaches that using Shapley values in model interpretability ensures consistency, nullifies attribution from missing inputs, and ensures that the sum of each individual input attribution value is not different from the actual model reward value. ((Sousa) Paragraph [0022] "An advantage of bringing the Shapley values framework into model interpretability is inheriting Shapley properties for model explanations, these being: local accuracy ensuring that the sum of all individual input attribution values is equal to the model's score; missingness dictating that missing inputs should have no impact on the model's score, and therefore their attribution must be null; and consistency ensuring that if an input's contribution to the model increases, then its attributed importance should not decrease.")
	Thus, it would be obvious to a person of ordinary skill in the art before the effective filing date of the application to have included Shapley values as a relevance metric and to use the relevance metric to check if the invention meets a condition, as in Sousa, into the invention of Wu, in view of Elshocht and Zhou, in order to take advantage of Shapley value properties of ensuring consistency, nullifying attribution from missing inputs, and ensuring that the sum of each individual input attribution value is not different from the actual model reward value. This would product the predictable result that is the invention claimed in claim 16 of this application.
Claims 7-9, and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wu, in view of Elshocht, Zhou, and Sousa, and in further view of ARTIFICIAL INTELLIGENCE / MACHINE LEARNING MODEL DRIFT DETECTION AND CORRECTION FOR ROBOTIC PROCESS AUTOMATION (US 20220024032 A1) by Singh et al., hereafter Singh.
Regarding claim 7, Wu, in view of Sousa, Elshocht, and Zhou, teaches all of the material claimed in claim 6.
Wu, in view of Sousa, Elshocht, and Zhou, suggests, but does not explicitly disclose, but together with Singh does teach:
	in response to determining that the one or more Shapley values satisfy the one or more conditions, adding one or more actions to the one or more action recommendations. ((Sousa) Paragraph [0042], "If it is determined at 408 that the relevance metric falls below the specified threshold, at 410, the first sub-sequence and the second sub-sequence are demarcated and the events in the second sub-sequence are lumped together." The relevance metric is Shapley values. The first sub-sequence and the second sub-sequence being marked could be the action added to the action recommendations. (Singh) Paragraph [0016] "When a change threshold is met or exceeded, an alert or a retraining trigger may be generated." An alert or a retraining trigger is adding one or more actions to the action recommendations. The alert or retraining trigger is generated in response to a threshold being met, where a threshold being met is satisfying one or more conditions.)
	The rationale to combine Wu with Sousa is the same as provided in the parent claim.
	Singh is the analogous art as Wu, Singh, Elshocht, and Zhou as all are in the same category of machine learning model predictions. In addition, Singh teaches that predictions made by machine learning may change over time so it would be beneficial to give an alert or retrain the model when that happens. ((Singh) Paragraph [0002]," ...However, predictions made by AI/ML models may change, or drift, over time...Accordingly, improved techniques for detecting and/or correcting AI/ML model drift may be beneficial.”) 
	Thus, it would be obvious to one of ordinary skill in the art before the effective filing date of the application to have implemented an alert or a retraining trigger when a condition is met, as taught by Singh, into the invention of Wu, in view of Sousa, Elshocht, and Zhou, in order to implement a detection and correction mechanism for when predictions made by the machine learning models change. This would produce the predictable result that is claim 7 of the instant application.
Regarding claim 8, Wu, in view of Sousa, Singh, Elshocht, and Zhou, teaches all of the material claimed in claim 7, and Sousa and Singh additionally teach:
	a condition that a sum of Shapley values associated with the first machine learning model are less than a predetermined ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model, ((Sousa) Paragraph [0027] "The sum of all input component importance values (Shapley values) corresponds to and explains the difference between the model's score…))." Paragraph [0041], "At 408, it is determined whether the relevance metric falls below a specified threshold. The specified threshold may take the form of a specific importance value that is empirically decided. The specified threshold may also take the form of a ratio of an importance value associated with the second sub-sequence to an importance value associated with the overall sequence of predictions (e.g., a ratio of Shapley values)." A sum of all input Shapley values corresponding to and explain the difference between a model’s score is a sum of Shapley values associated with the first machine learning model. The relevance metric is a sum of Shapley values associated with the first machine learning model. The specified threshold may take the form of a ratio such as a ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model.) and wherein the one or more actions comprise retraining the first machine learning model. ((Singh) Paragraph [0016] " When a change threshold is met or exceeded, an alert or a retraining trigger may be generated." The retraining trigger can apply to the first machine learning model.)
	The rationale to combine Wu with Sousa and Singh is the same as provided in the parent claim.
Regarding claim 9, Wu, in view of Sousa, Singh, Elshocht, and Zhou, teaches all of the material claimed in claim 8, and Sousa and Singh additionally teach:
	in response to determining that the sum of Shapley values associated with the first machine learning model are less than the predetermined ratio of the total sum of the Shapley values associated with the first machine learning model and the Shapley values associated with the second machine learning model, ((Sousa) Paragraph [0027] "The sum of all input component importance values (Shapley values) corresponds to and explains the difference between the model's score…))." Paragraph [0041], "At 408, it is determined whether the relevance metric falls below a specified threshold. The specified threshold may take the form of a specific importance value that is empirically decided. The specified threshold may also take the form of a ratio of an importance value associated with the second sub-sequence to an importance value associated with the overall sequence of predictions (e.g., a ratio of Shapley values)." A sum of all input Shapley values corresponding to and explain the difference between a model’s score is a sum of Shapley values associated with the first machine learning model. The relevance metric is a sum of Shapley values associated with the first machine learning model. The specified threshold may take the form of a ratio such as a ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model.) automatically triggering retraining of the first machine learning model. ((Singh) Paragraph [0016] " When a change threshold is met or exceeded, an alert or a retraining trigger may be generated." The retraining trigger can apply to the first machine learning model. The retraining trigger is generated in response to the threshold being met.)
	The rationale to combine Wu with Sousa and Singh is the same as provided in the parent claim.
Regarding claim 17, Wu, in view of Sousa, Elshocht, and Zhou, teaches all of the material claimed in claim 16.
Wu, in view of Sousa, Elshocht, and Zhou, suggests, but does not explicitly disclose, but together with Singh does teach:
	in response to determining that the one or more Shapley values satisfy the one or more conditions, adding one or more actions to the one or more action recommendations. ((Sousa) Paragraph [0042], "If it is determined at 408 that the relevance metric falls below the specified threshold, at 410, the first sub-sequence and the second sub-sequence are demarcated and the events in the second sub-sequence are lumped together." The relevance metric is Shapley values. The first sub-sequence and the second sub-sequence being marked could be the action added to the action recommendations. (Singh) Paragraph [0016] "When a change threshold is met or exceeded, an alert or a retraining trigger may be generated." An alert or a retraining trigger is adding one or more actions to the action recommendations. The alert or retraining trigger is generated in response to a threshold being met, where a threshold being met is satisfying one or more conditions.)
	The rationale to combine Wu with Sousa is the same as provided in the parent claim.
	Singh is the analogous art as Wu, Singh, Elshocht, and Zhou as all are in the same category of machine learning model predictions. In addition, Singh teaches that predictions made by machine learning may change over time so it would be beneficial to give an alert or retrain the model when that happens. ((Singh) Paragraph [0002]," ...However, predictions made by AI/ML models may change, or drift, over time...Accordingly, improved techniques for detecting and/or correcting AI/ML model drift may be beneficial.”) 
	Thus, it would be obvious to one of ordinary skill in the art before the effective filing date of the application to have implemented an alert or a retraining trigger when a condition is met, as taught by Singh, into the invention of Wu, in view of Singh, Elshocht, and Zhou, in order to implement a detection and correction mechanism for when predictions made by the machine learning models change. This would produce the predictable result that is claim 17 of the instant application.
Regarding claim 18, Wu, in view of Sousa, Singh, Elshocht, and Zhou, teaches all of the material claimed in claim 17, and additionally teaches:
	a condition that a sum of Shapley values associated with the first machine learning model are less than a predetermined ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model, ((Sousa) Paragraph [0027] "The sum of all input component importance values (Shapley values) corresponds to and explains the difference between the model's score…))." Paragraph [0041], "At 408, it is determined whether the relevance metric falls below a specified threshold. The specified threshold may take the form of a specific importance value that is empirically decided. The specified threshold may also take the form of a ratio of an importance value associated with the second sub-sequence to an importance value associated with the overall sequence of predictions (e.g., a ratio of Shapley values)." A sum of all input Shapley values corresponding to and explain the difference between a model’s score is a sum of Shapley values associated with the first machine learning model. The relevance metric is a sum of Shapley values associated with the first machine learning model. The specified threshold may take the form of a ratio such as a ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model.) and wherein the one or more actions comprise retraining the first machine learning model. ((Singh) Paragraph [0016] " When a change threshold is met or exceeded, an alert or a retraining trigger may be generated." The retraining trigger can apply to the first machine learning model.)
	The rationale to combine Wu with Sousa and Singh is the same as provided in the parent claim.
Regarding claim 19, Wu, in view of Sousa, Singh, Elshocht, and Zhou, teaches all of the material claimed in claim 18, and additionally teaches:
	in response to determining that the sum of Shapley values associated with the first machine learning model are less than the predetermined ratio of the total sum of the Shapley values associated with the first machine learning model and the Shapley values associated with the second machine learning model, ((Sousa) Paragraph [0027] "The sum of all input component importance values (Shapley values) corresponds to and explains the difference between the model's score…))." Paragraph [0041], "At 408, it is determined whether the relevance metric falls below a specified threshold. The specified threshold may take the form of a specific importance value that is empirically decided. The specified threshold may also take the form of a ratio of an importance value associated with the second sub-sequence to an importance value associated with the overall sequence of predictions (e.g., a ratio of Shapley values)." A sum of all input Shapley values corresponding to and explain the difference between a model’s score is a sum of Shapley values associated with the first machine learning model. The relevance metric is a sum of Shapley values associated with the first machine learning model. The specified threshold may take the form of a ratio such as a ratio of a total sum of the Shapley values associated with the first machine learning model and Shapley values associated with the second machine learning model.) automatically triggering retraining of the first machine learning model. ((Singh) Paragraph [0016] " When a change threshold is met or exceeded, an alert or a retraining trigger may be generated." The retraining trigger can apply to the first machine learning model. The retraining trigger is generated in response to the threshold being met.)
	The rationale to combine Wu with Sousa and Singh is the same as provided in the parent claim.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to machine learning, residual prediction learning, Shapley values, retraining machine learning models, and time series.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DYLAN H LAI whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 7:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 5712524241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

D. H. L.
Examiner
Art Unit 2144



/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144
Read full office action
MACHINE LEARNING FORECASTING BASED ON RESIDUAL PREDICTIONS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

MACHINE LEARNING FORECASTING BASED ON RESIDUAL PREDICTIONS

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email