Last updated: April 19, 2026
Application No. 17/667,406
ATTENTION MECHANISM AND DATASET BAGGING FOR TIME SERIES FORECASTING USING DEEP NEURAL NETWORK MODELS

Final Rejection §101§103§112
Filed
Feb 08, 2022
Examiner
PAULA, CESAR B
Art Unit
2145
Tech Center
2100 — Computer Architecture & Software
Assignee
Paypal Inc.
OA Round
2 (Final)
Interview Optional

— +8.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 169 resolved cases, 2023–2026
Examiner Intelligence

PAULA, CESAR B View full profile →
Grants only 32% of cases
Career Allow Rate
55 granted / 169 resolved
-22.5% vs TC avg
Moderate +8% lift
Without
With
+8.3%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
25 currently pending
Career history
194
Total Applications
across all art units
Statute-Specific Performance

§101
16.3%
-23.7% vs TC avg
§103
47.6%
+7.6% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
14.4%
-25.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 169 resolved cases
Office Action

§101 §103 §112
Detailed Action
This Non-Final Rejection is responsive to the claim amendment filed on 9/04/2025. Claims 1-20 are pending. Claims 1, 11, and 17 are independent claims. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Objections
The objection to claim 15 objected has been withdrawn as necessitated by the amendment. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


The rejection of claim 4 has been withdrawn as necessitated by the amendment. 
Claims 11-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 11 recites “an attention mechanism that focuses on past timesteps of relevance to a future predictive forecast at a corresponding timestep ”.  It’s confusing and unclear as to how the attention mechanism can focus to a future timestep. 
Claim 17 recites similar limitations as those found in claim 11, and is likewise rejected.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

Regarding Claim 1: 
Subject Matter Eligibility Analysis Step 1: 
Claim 1 recites a “system” and is thus a machine, one of the four statutory categories of patentable subject matter. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 1 recites abstract ideas: 
obtaining feature data for features of an entity, wherein the feature data comprises time-based data for at least one of the features over a plurality of timesteps within a time period; 
(A mental process of obtaining data through observation of features over time and processing the data) 
determining a first predictive forecast for the entity at a first time after the time period using the feature data (A mental process of predicting a future event). 
wherein the first predictive forecast is further determined using a plurality of predictions determined by the deep neural network model over the time period from the feature data and the one or more importance levels for the attention mechanism and one or more of the plurality of timesteps associated with the first time; (A mental process of predicting based on predictions, importance and timesteps. This also could be a mathematical determination if the prediction is based on numeric values of importance). 
and determining a second predictive forecast for the entity at a second time after the first time using the feature data, the first predictive forecast (A mental process of predicting a future event). 
wherein the second predictive forecast is further determined using and the one or more importance levels for the attention mechanism and one or more of the plurality of timesteps associated with the second time. (A mental process of predicting based on importance and timesteps. This also could be a mathematical determination if the prediction is based on numeric values of importance). 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 1 further recites additional elements: 
A service provider system comprising: a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the service provider system to 
perform operations comprising: This limitation merely recites a device to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
accessing an intelligent forecasting framework comprising a deep neural network model configured to enable a time series forecasting associated with the entity, wherein the deep neural network model is trained by the intelligent forecasting framework using training data associated with the features and an attention mechanism that identifies one or more importance levels for one or more timesteps of the plurality of timesteps based on the training data, This limitation merely recites a device/machine learning model to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
and wherein the deep neural network model comprises a combined model based on a plurality of deep neural network models trained on subsets of the training data generated using data bagging with the training data; This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, there are additional elements that only amount to no more than mere instructions to apply the exception using a device. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. As discussed above with respect to integration of the abstract ideas into practical application, the remaining additional element only specifies technological environment to perform the method. Limiting the abstract idea to a particular field 
of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 2: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 2 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 2 further recites additional element: 
The service provider system of claim 1, wherein the deep neural network model uses a long short-term memory (LSTM) recurrent neural network architecture. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 3: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 3 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 3 further recites additional element: 
The service provider system of claim 1, wherein the attention mechanism comprises an architecture that provides one or more weights to the one or more timesteps of the features when providing the time series forecasting of a future timestep. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 4: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 4 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 4 further recites additional element: 
The service provider system of claim 3, where the attention mechanism is one of a plurality of attention mechanisms for different layers of the deep neural network model. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 5: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 5 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 5 further recites additional element: 
The service provider system of claim 1, wherein the time-based data comprises data points for at least a portion of the feature data for the features collected over the time period. This 
limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 6: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 6 recites abstract ideas: 
generating, using the data bagging, at least one additional training data sets from the training data for the features of the deep neural network model (a mental process of creating a data set by aggregating data from a set) 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 6 further recites additional element: 
The service provider system of claim 1, wherein prior to obtaining the feature data, the operations further comprise:… This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
wherein the at least one additional training data sets comprises a subset of data records in the training data randomly selected using the data bagging. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 7: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 7 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 7 further recites additional element: 
The service provider system of claim 1, wherein prior to obtaining the feature 
data, the operations further comprise: training the deep neural network model using the training data associated with the features and an LSTM recurrent neural network 
architecture. This limitation merely recites a device/machine learning model to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, the additional element only amounts to no more than mere instructions to apply the exception using a generic computer component or machine learning model. Mere instructions to apply an exception using a generic computer component or machine learning model cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 8: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 8 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 8 further recites additional element: 
The service provider system of claim 7, wherein the training the deep neural 
network model utilizes the attention mechanism and the data bagging associated with the 
training data. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 9: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 9 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 9 further recites additional element: 
The service provider system of claim 7, wherein the training the deep neural 
network model utilizes at least one external feature of the features that is separate from a 
variable being forecasted by the time series forecasting for at least the first predictive 
forecast and the second predictive forecast. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 10: 
Subject Matter Eligibility Analysis Step 1: 
A machine as recited by Claim 1. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 10 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 10 further recites additional elements: 
The service provider system of claim 1, wherein the features comprises an input feature for the time series forecasting of a corresponding output feature at a future time, wherein the input feature comprises one of a total payment volume, a future revenue, a purchase amount, or a transaction parameter, This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
wherein the features further comprise at least one input external feature for use with the time series forecasting of the corresponding output feature This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
and wherein the at least one input external features comprises at least one of customer data, fraud data, transaction data, a macro-economical feature, a trend in an e-commerce 
industry, a pandemic effect feature, or a total payment volume migration feature. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 11: 
Subject Matter Eligibility Analysis Step 1: 
Claim 11 recites a “method” and is thus process, one of the four statutory categories of patentable subject matter. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 11 recites abstract ideas: 
determining,…, a plurality of past traits for the entity over a time period using feature data over the time period for features (A mental process of determining traits of features) 
determining,…, a first predictive forecast of the trait at a first future time after the time period based on the feature data, the plurality of past traits, and a temporal factor; (A mental process of predicting a future event). 
determining,…, a second predictive forecast of the trait for the entity at a second future time after the first future time based on the feature data, the first predictive forecast, the plurality of past traits, and the temporal factor (A mental process of predicting a future event). 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 11 further recites additional elements: 
using a deep neural network model used configured to enable time series forecasting of a trait of an entity at one or more future times; This limitation merely recites a device/machine learning model to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
processed by the deep neural network; This limitation merely recites a device/machine learning model to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
using the deep neural network model… This limitation merely recites a device/machine learning model to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
wherein the deep neural network model is trained using an attention mechanism and data bagging for training data associated with the features; This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As 
discussed above with respect to integration of the abstract idea into practical application, there are additional elements that only amount to no more than mere instructions to apply the exception using a generic computer component or machine learning model. Mere instructions to apply an exception using a generic computer component or machine learning model cannot provide an inventive concept. As discussed above with respect to integration of the abstract ideas into practical application, the remaining additional element only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 12: 
Subject Matter Eligibility Analysis Step 1: 
A method as recited by Claim 11. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 12 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 12 further recites additional element: 
The method of claim 11, wherein the deep neural network model is trained using 
a long short-term memory (LSTM) recurrent neural network architecture with the 
attention mechanism and the data bagging. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 13: 
Subject Matter Eligibility Analysis Step 1: 
A method as recited by Claim 11. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 13 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 13 further recites additional element: 
The method of claim 11, wherein the trait comprises a forecasted variable at the 
one or more future times. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the 
abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 14: 
Subject Matter Eligibility Analysis Step 1: 
A method as recited by Claim 11. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 14 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 14 further recites additional element: 
The method of claim 13, wherein the forecasted variable comprises one of a total 
payment volume, a future revenue, a purchase amount, or a transaction parameter. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 15: 
Subject Matter Eligibility Analysis Step 1: 
A method as recited by Claim 11. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 15 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 15 further recites additional element: 
The method of claim 11, how the first predictive forecast comprises a vector 
provided as an input feature for the deep neural network model during the determining 
the second predictive forecast. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 16: 
Subject Matter Eligibility Analysis Step 1: 
A method as recited by Claim 11. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 16 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 16 further recites additional element: 
The method of claim 11, wherein the determining the plurality of past traits of the 
entity over the time period comprises determining a plurality of vectors at different past 
times over the time period, This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
and wherein each of the plurality of vectors are used as an input when determining a next one of the plurality of past traits by the deep neural network model. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 17: 
Subject Matter Eligibility Analysis Step 1: 
Claim 17 recites a “non-transitory machine-readable medium” and is thus a manufacture, one of the four statutory categories of patentable subject matter. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 17 recites abstract ideas: 
determining additional feature data for additional features used for training the LSTM neural network model for the future predicted trait; (A mental process for selecting data) 
performing data bagging of data records for the model features in the training data (mental process of grouping/aggregating data into subsets) 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 17 further recites additional elements: 
A non-transitory machine-readable medium having stored thereon machine readable instructions executable to cause a machine to perform operations comprising:… This limitation merely recites a device to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
receiving training data for model features of a long short-term memory (LSTM) neural network model, This limitation merely recites insignificant extra-solution activity of data gathering (See MPEP 2106.05(g)). 
wherein the training data is associated with a time period and comprises temporal data for the model features over the time period, This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
and wherein the LSTM neural network model is configured to enable a predictive forecasting of a future predicted trait; This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
and training the LSTM neural network model using the training data, the data bagging, the additional feature data, and an attention mechanism for identifying one or more features for a focus during training of the LSTM neural network model. This limitation merely recites a device/machine learning model to perform the abstract ideas of the claims e.g., “apply it to a device” (See MPEP 2106.05(f)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into practical application, there are additional elements that only amount to no more than mere instructions to apply the exception using a device. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The additional element of “receiving training data for model features of a long short-term memory (LSTM) neural network model, ” is an activity of “mere data gathering” (See MPEP 2106.05(g)). Limiting the abstract idea to a well understood, routine and conventional activity cannot provide an inventive concept. As discussed above with respect to integration of the abstract ideas into practical application, the remaining additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 18: 
Subject Matter Eligibility Analysis Step 1: 
A manufacture as recited by Claim 18. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 18 recites abstract ideas: 
The non-transitory machine-readable medium of claim 17, wherein the performing the data bagging comprises generating a plurality of data sets of the data records for the training the LSTM neural network model. (A mental process of creating/selecting datasets to use) 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 18 does not further recite any additional elements. 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. 
Regarding Claim 19: 
Subject Matter Eligibility Analysis Step 1: 
A manufacture as recited by Claim 18. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 19 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 19 further recites additional element: 
The non-transitory machine-readable medium of claim 17, wherein the attention mechanism applies one or more weights to the training data for the training the LSTM neural network model for the predictive forecasting of the future predicted trait. This 
limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 
Regarding Claim 20: 
Subject Matter Eligibility Analysis Step 1: 
A manufacture as recited by Claim 18. 
Subject Matter Eligibility Analysis Step 2A Prong 1: 
Claim 20 does not recite any abstract ideas. 
Subject Matter Eligibility Analysis Step 2A Prong 2: 
Claim 20 further recites additional element: 
The non-transitory machine-readable medium of claim 17, wherein the training 
data further comprises customer data over the time period associated with customers of a 
service provider, and wherein the future predicted trait comprises a total payment 
volume. This limitation merely specifies a particular technological environment which the abstract idea is to take place, i.e., a field of use (See MPEP 2106.05(h)). 
Subject Matter Eligibility Analysis Step 2B: 
The claim does not include additional elements considered individually and in combination that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract ideas into practical application, the additional elements only specify technological environment to perform the method. Limiting the abstract idea to a particular field of use or technological environment cannot provide an inventive concept. The claim is not patent eligible. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 

Claim(s) 1-3, and 5-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US PG PUB 2021/0224912 in view of Dang et al., US PG PUB 2021/0350225 and in further view of Agrawal et al., US PG PUB 2019/0095756. 

Regarding Claim 1: 
Wang et al. teaches: 
A service provider system (Wang et al., Par. 28 “In some embodiments, the machine learning model processes such historical data and identifies various features associated with forecasting order data for the financial service provider to execute at a future date.”). 
comprising: a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the service provider system to perform operations comprising: (Wang et al., Par. 67, “Embodiments may also be at least partly implemented as instructions contained in or on a non-transitory computer-readable medium, which may be read and executed by one or more processors to enable performance of the operations described herein.”). 
obtaining feature data for features of an entity, wherein the feature data comprises time-based data for at least one of the features over a plurality of timesteps within a time period; (Wang et al., Fig.5 and Par. 54, “In the illustrated embodiment shown in FIG. 5, the logic flow 500 processes a feature set corresponding to a financial service provider at block 502. The feature set includes time series data for each feature of a plurality of features, and the time series data includes market index information, order data, and reconstitution information over a time period. As described herein, market index information refers (in part) to market index features, such as weighted averages of stock prices over a time period;”, signifies receiving a feature data and processing it, where the data is time-based within a time period and has steps as it contains information over the whole time period). 
accessing an intelligent forecasting framework comprising a deep neural network model configured to enable a time series forecasting associated with the entity, (Wang et al., Par. 17 “the deep learning model is a Convolutional Neural Network (CNN) that includes parameters specifying filters to be applied on individual features”, and Par. 18 “In some instances, the financial service provider implements region-level forecasting by configuring the deep learning model to make predictions as to a number of orders to execute within a specific geographic region. For example, a global bank may want to predict the number of orders at county level. The deep learning model may leverage a temporal correlation between regional markets to accomplish region-level forecasting. The CNN structure allows for temporal alignment of market index features from different regions”, signifies a deep neural network in a convolutional neural network used to do time series forecasting). 
wherein the deep neural network model is trained by the intelligent forecasting framework using training data associated with the features (Wang et al., Fig. 4 and Par. 51, “The training step 414, at a revert normalization sub-step, produces the output 416 by converting at least some of the parameter values into another numbering system or scale. The CNN 413 is directed to a predict 418 component of the training step where a prediction is made regarding an optimal number of orders to execute at a next day or another future date.”, training the model which requires training data in this case the parameter values). 
and an attention mechanism that identifies one or more importance levels for one or more timesteps of the plurality of timesteps based on the training data, (Wang et al., Fig. 4 and Par. 51, “The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies a mechanism that identifies importance as accuracy). 
determining a first predictive forecast for the entity at a first time after the time period using the feature data and the deep neural network model, (Wang et al., Fig. 4, Par. 50, “FIG. 4 depicts the CNN Encoder1 408-1 and the CNN Encoder2 408-2 as filtering feature values in a first CNN layer array and a second CNN layer array, respectively.” and Par. 51, “The 
CNN 413 is directed to a predict 418 component of the training step where a prediction is made regarding an optimal number of orders to execute at a next day or another future date. When the next day or other future date arrives, the financial service provider executes the predicted number of orders. Another component of the training step 414, a train 420 component, uses the predicted number of orders to train the CNN 413 and improve upon an accuracy of future predictions. The train 420 component determine a target indicating an optimal number of orders that should have been executed.”, signifies a forecast prediction from a deep neural network model of a convolutional neural network using data values from the feature shown in the figure) wherein the first predictive forecast is further determined using a plurality of predictions determined by the deep neural network model over the time period from the feature data and the one or more importance levels for the attention mechanism and one or more of the plurality of timesteps associated with the first time; (Wang et al., Fig. 4 and Par. 51, “The training step 414, at a revert normalization sub-step, produces the output 416 by converting at least some of the parameter values into another numbering system or scale. The CNN 413 is directed to a predict 418 component of the training step where a prediction is made regarding an optimal number of orders to execute at a next day or another future date. When the next day or other future date arrives, the financial service provider executes the predicted number of orders. Another component of the training step 414, a train 420 component, uses the predicted number of orders to train the CNN 413 and improve upon an accuracy of future predictions… The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies using predictions to improve accuracy to create an optimized forecast using the importance of accuracy and timesteps of days in this case). 
and determining a second predictive forecast for the entity at a second time after the first time using the feature data, the first predictive forecast, and the deep neural network model, (Wang et al., Figs. 4-6, Par. 61 “The logic flow 600 may adjust a parameter of the deep learning model at block 606. As described herein, the deep learning model may include parameters operative to run a forecasting method for the financial service provider. Some parameters select which filter(s) to apply on each set of feature values in a feature set (e.g., the feature set 260 of FIG. 2) while other parameters select which weights/biases to use when combining filtered feature values into a prediction regarding the number of orders for the sub-market. The logic flow 600 may adjust any of the above parameters to improve upon future predictions.”, signifies another prediction being made using the features and previous predictions to update the model and make new predictions) 
wherein the second predictive forecast is further determined using the one or more importance levels for the attention mechanism and one or more of the plurality of timesteps associated with the second time. (Wang et al., Fig. 4 and Par. 51, “The training step 414, at a revert normalization sub-step, produces the output 416 by converting at least some of the parameter values into another numbering system or scale. The CNN 413 is directed to a predict 418 component of the training step where a prediction is made regarding an optimal number of orders to execute at a next day or another future date. When the next day or other future date arrives, the financial service provider executes the predicted number of orders. Another component of the training step 414, a train 420 component, uses the predicted number of orders to train the CNN 413 and improve upon an accuracy of future predictions… The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon 
the CNN 413's accuracy.”, signifies using predictions to improve accuracy to create an optimized forecast using the importance of accuracy and timesteps of days in this case). 
Wang et al. does not teach the following limitation however Dang et al does. 
and wherein the deep neural network model comprises a combined model based on a plurality of deep neural network models (Dang et al., Par. 27, “Additionally, unlike conventional deep learning models that employ point estimated approaches, various embodiments described herein can learn an ensemble of infinite number of deep learning models whose weights can be characterized by one or more probability distributions.”, signifies having a combined deep learning model through an ensemble). 
Wang et al. and Dang et al. are both analogous art to the present invention because the references are reasonably pertinent in using time series data and attention mechanisms in machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network of Wang et al. with the combined deep neural network of Dang et al. The motivation to do so is to enhance confidence and accuracy of an output (Dang et al., Par. 64, “Along with the feature vectors h,, the attention vectors can be inputs to compute the conditional distribution Pr (x,Cdlla,,wA, h,), which can be equivalent to an ensemble of K neural networks and their output values from a distribution. Further, the attention decoder component 402 can compute confidence values on both the attention weights as well as the output predictive value. For example, the larger value of K, the more confidence assigned to the approximate quantities.”). 
Wang et al. and Dang et al. do not teach the following limitation but Agrawal et al. does. 
trained on subsets of the training data generated using data bagging with the training data; (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be 
decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging) 
Wang et al., Dang et al. and Agrawal et al. are all analogous art to the present invention because the references are reasonably pertinent in training and using machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network method of Wang et al., the combined deep neural network of Dang et al. and with the bagging method of Agrawal et al. The motivation to so is to enhance learning and optimization of data (Agrawal et al., Par. 162, “Computer 600 improves ensemble meta-learning with optimizations such as boosting and bagging.”). 
Regarding Claim 2, Wang et al. teaches: 
The service provider system of claim 1, wherein the deep neural network model uses a long short-term memory (LSTM) recurrent neural network architecture. (Wang et al., Par. 46, “A bidirectional Long short term memory (LSTM) is a type of machine learning model implementing an artificial recurrent neural network (RNN) architecture. The bidirectional LSTM is well-suited for classifying or making predictions based upon the time series data of the calendar features. In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices 404-1.”). 

Regarding Claim 3, Wang et al. does not teach the following limitation however Dang et al does: 
The service provider system of claim 1, wherein the attention mechanism comprises an architecture that provides one or more weights to the one or more timesteps of the features when providing the time series forecasting of a future timestep. (Dang et al., Fig. 4, EQ.2, Par. 48, “the attention decoder component 402 can adapt one or more attention mechanisms to the context of the multivariate time series to learn a set of variables {a.sub.t.sup.(i)}.sub.i=1.sup.D associated with {h.sub.t.sup.(i)}.sub.i=1.sup.D for each output d-th time series”, and Par. 50, “or example, where {a.sub.t.sup.(i)}.sub.i=1.sup.D are non-negative values subject to Σ.sub.ia.sub.t.sup.(i)=1. As such, where {a.sub.t.sup.(i)}.sub.i=1.sup.D can be interpreted as attention weights correspondingly to {h.sub.t.sup.(i)}.sub.i=1.sup.D. A higher value of a.sub.t.sup.(i) can imply more relevant information found in h.sub.t.sup.(i). Thus, the d-th time series can be dependent on i-th time series at time point t.”, signifies attaching weights on each points in time in the time series). 
Wang et al. and Dang et al. are both analogous art to the present invention because the references are reasonably pertinent in using time series data and attention mechanisms in machine learning. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network of Wang et al. with the weight mapping of timesteps of Dang et al. The motivation to do so is to be able to analyze each time and find dependencies (Dang et al., Par. 51, “the distribution can describe the uncertainty in attention and can be analyzed by the attention decoder component 402 to estimate confidence values in discovering time series dependencies.”). 

Regarding Claim 5, Wang et al. teaches: 
The service provider system of claim 1, wherein the time-based data comprises data points for at least a portion of the feature data for the features collected over the time period. (Wang et al., Par. 46, “The calendar features 404-2 generally refer to a reconstitution schedule for the market indices 404-1. Therefore, the calendar features 404-2 includes time series data encompassing points-in-time of a time period associated with the reconstitution schedule.”, signifies time data with data points from points in time for features). 

Regarding Claim 6, Wang et al. and Dang et al. do not teach the following limitations, however Agrawal et al. does. 
The service provider system of claim 1, wherein prior to obtaining the feature data, the operations further comprise: generating, using the data bagging, at least one additional training data set from the training data for the features (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging which using training data similar features meant to be learned). 
of the deep neural network model, wherein the at least one additional training data set comprises a subset of data records in the training data randomly selected using the data bagging. (Agrawal et al., Par. 169, “Feature bagging entails training each of meta models 651 - 656 with a distinct partially overlapping subset of training dataset features such as meta -features 640 and hyperparameters 680”, signifies random subsets of training data sets from the training data generated inherently through data bagging and used). 
Regarding Claim 7, Wang et al. teaches: 
The service provider system of claim 1, wherein prior to obtaining the feature data, the operations further comprise: training the deep neural network model using the training data associated with the features and an LSTM recurrent neural network architecture. (Wang et al., Fig. 4, Par. 46, “bidirectional Long short term memory (LSTM) is a type of machine learning model implementing an artificial recurrent neural network (RNN) architecture. The bidirectional LSTM is well-suited for classifying or making predictions based upon the time series data of the calendar features.”, and Par. 51, “The training step 414, at a revert normalization sub-step, produces the output 416 by converting at least some of the parameter values into another numbering system or scale. The CNN 413 is directed to a predict 418 component of the training step where a prediction is made regarding an optimal number of orders to execute at a next day or another future date.”, signifies training shown in the figure using the LSTM recurrent neural network architecture as well as having training data in parameters that are associated with the features to train to make an accurate prediction for the specific feature/forecast). 
Regarding Claim 8, Wang et al. teaches: 
The service provider system of claim 7, wherein the training the deep neural network model utilizes the attention mechanism (Wang et al., Fig. 4 and Par. 51, “The train 420 
component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies a mechanism that identifies importance as accuracy). 
and the data bagging associated with the training data. (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging). 
Regarding Claim 9, Wang et al. teaches: 
The service provider system of claim 7, wherein the training the deep neural network model utilizes at least one external feature of the features that is separate from a variable being forecasted by the time series forecasting for at least the first predictive forecast and the second predictive forecast. (Wang et al., Fig. 4, Par. 17, “The CNN further includes parameters specifying weights (e.g., coefficients) in a function that combines the filtered values of each feature of the plurality of features and determines the number of orders for the financial service provider to execute by the future date.”, and Par. 43, “As shown in FIG. 4, the forecasting method 400 commences with a processing step 402 for transforming raw data (e.g., historical data) into features including those corresponding to market indices 404-1, calendar features 404-2, and historical allocation 404-3. After identifying these features in the raw data, the forecasting method 400 in the processing step 402 further processes the features to prepare feature data for filtering.”, signifies external features from raw data being market indices, historic data, etc. used in training and for forecasting the variable of number of orders). 

Regarding Claim 10, Wang et al. teaches: 
The service provider system of claim 1, wherein the features comprise an input feature for the time series forecasting of a corresponding output feature at a future time, wherein the input feature comprises one of a total payment volume, a future revenue, a purchase amount, or a transaction parameter, (Wang et al., Fig. 4, Par. 47, “The historical allocation 404-3 includes historical order data by the financial service provider based upon at least one of the market indices 404-1. As described herein, the historical order data includes at least a number of orders executed at a particular point-in-time in the above-mentioned time period.”, signifies input of historical data as shown in the figure which includes number of orders which is a transaction parameter) 
wherein the features further comprise at least one input external feature for use with the time series forecasting of the corresponding output feature, and wherein the at least one input external features comprises at least one of customer data, fraud data, transaction data, a macro-economical feature, a trend in an e-commerce industry, a pandemic effect feature, or a total payment volume migration feature. (Wang et al., Fig. 4, Par. 17, “The CNN further includes parameters specifying weights (e.g., coefficients) in a function that combines the filtered values of each feature of the plurality of features and determines the number of orders for the financial service provider to execute by the future date.”, and Par. 43, “As shown in FIG. 4, the forecasting method 400 commences with a processing step 402 for transforming raw data (e.g., historical data) into features including those corresponding to market indices 404-1, calendar features 404-2, and historical allocation 404-3. After identifying these features in the raw data, the forecasting method 400 in the processing step 402 further processes the features to prepare feature data for filtering.”, signifies feature from raw data being market indices for 
forecasting the variable of number of orders which can be a trend in e-commerce which can be considered external as it is separate from the forecasted variable). 

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US PG PUB 2021/0224912 in view of Dang et al., US PG PUB 2021/0350225 in further view of Agrawal et al., US PG PUB 2019/0095756 as applied to claims 1-3 and 5-10 above, and further in view of Hester et al., US Patent 11921824. 

Regarding Claim 4, Wang et al. teaches mechanisms for adjusting weights of different nodes or layers, “The logic flow 600 may adjust a parameter of the deep learning model at block 606. As described herein, the deep learning model may include parameters operative to run a forecasting method for the financial service provider. Some parameters select which filter(s) to apply on each set of feature values in a feature set (e.g., the feature set 260 of FIG. 2) while other parameters select which weights/biases to use when combining filtered feature values into a prediction regarding the number of orders for the sub- market. The logic flow 600 may adjust any of the above parameters to improve upon future predictions.”, signifies another prediction being made using the features, temporal data and previous predictions to update the model and make new predictions (Figs. 4-6, Par. 61, 51). 
Wang et al., Dang et al., and Agrawal., et al. does not teach the following limitation, however Hester et al. does. 
The service provider system of claim 3, where the attention mechanism is one of a plurality of attention mechanisms for different layers of the deep neural network model. (Hester et al., Col. 4, Lines 38-52, “Each decoder layer comprises three components: a self-attention mechanism (e.g., scaled dot product attention), an attention mechanism over the encodings, and a feed-forward neural network. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. In a self-attention layer, the keys, values and queries come from the same place—in the case of the encoder, the output of the previous layer in the encoder. Each position in the encoder can attend to all positions in the previous layer of the encoder. In “encoder decoder attention” layers (sometimes referred to as “cross-attention”), the queries come from the previous decoder layer, and the keys and values come from the output of the encoder.”, signifies multiple attention mechanisms for different layers of a encoder-decoder which is a deep neural network as having multiple hidden layers). 
Wang et al., Dang et al., Agrawal et al. and Hester et al. are all analogous art to the present invention because the references are reasonably pertinent in training and using machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network method of Wang et al., the combined deep neural network and weight mapping of Dang et al. with the bagging method of Agrawal et al. and with the multiple attention mechanisms for different layers of Hester et al. The motivation to so is to draw information from specific parts or layers of the model or neural network (Hester et al., Col. 4, Lines 38-47, “Each decoder layer comprises three components: a self-attention mechanism (e.g., scaled dot product attention), an attention mechanism over the encodings, and a feed-forward neural network. The decoder functions in a similar fashion to the encoder, but an additional attention mechanism is inserted which instead draws relevant information from the encodings generated by the encoders. In a self-attention layer, the keys, values and queries come from the same place—in the case of the encoder, the output of the previous layer in the encoder.”). 

Claims 11-14 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US PG PUB 2021/0224912 in view of Agrawal et al., US PG PUB 2019/0095756. 

The following rejections are made in light of the 35 USC 112 b rejections above.
Regarding Claim 11, Wang et al. teaches: 
A method comprising: determining, using a deep neural network model configured to enable time series forecasting of a trait of an entity at one or more future times, (Wang et al., Figs. 1, 4, Par. 27, “the application 120 is operative to provide users with forecasting services in a particular field, such as financial forecasting. The application 120 may invoke a machine 
learning component 122-2 to build, train, and/or operate a machine learning model ( e.g., a deep learning model) configured to transform various historical data associated with a financial service provider into a prediction about future orders ( e.g., trading orders)” and Par. 33, “The feature set 260 may include a set of (feature) values for each feature of a plurality of features. In some embodiments, each feature's set of values in the feature set 260 includes time series data.”, signifies forecasting of a trait being orders in the future using time series data in historic data with multiple times using a deep neural network of a convolutional neural network shown in Fig. 4). 
a plurality of past traits for the entity over a time period using feature data over the time period for features processed by the deep neural network model, (Wang et al. Fig. 4, Par. 33 “The feature set 260 may include a set of (feature) values for each feature of a plurality of features. In some embodiments, each feature's set of values in the feature set 260 includes time series data. The plurality of features may include one or more features corresponding to market index information and reconstitution information over a time period.”, and Par. 43, “forecasting method 400 commences with a processing step 402 for transforming raw data (e.g., historical data) into features including those corresponding to market indices 404-1, calendar features 404-2, and historical allocation 404-3. After identifying these features in the raw data, the forecasting method 400 in the processing step 402 further processes the features to prepare feature data for filtering”, signifies feature data processed in the convolutional neural network structure using historical data to make future predictions)--  that focuses on past timesteps of relevance to a future predictive forecast at a corresponding timestep. 
wherein the deep neural network model is trained using an attention mechanism (Wang et al., Fig. 4 and Par. 51, “The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies a mechanism that identifies importance as accuracy to train the model). 
determining, using the deep neural network model and the attention mechanism, a first predictive forecast of the trait at a first future time after the time period based on the feature data, the plurality of past traits, and a temporal factor; (Wang et al., Fig. 4, Par. 50, “FIG. 4 depicts the CNN Encoder1 408-1 and the CNN Encoder2 408-2 as filtering feature values in a first CNN layer array and a second CNN layer array, respectively.” and Par. 51, “The CNN 413 is directed to a predict 418 component of the training step where a prediction is made regarding an optimal number of orders to execute at a next day or another future date. When the next day or other future date arrives, the financial service provider executes the predicted number of orders. Another component of the training step 414, a train 420 component, uses the predicted number of orders to train the CNN 413 and improve upon an accuracy of future predictions. The train 420 component determine a target indicating an optimal number of orders that should have been executed.”, signifies a forecast prediction from a deep neural network model of a convolutional neural network using data values from the features including historic temporal features and past market traits shown in the figure). 
and determining, using the deep neural network model and the attention mechanism, a second predictive forecast of the trait for the entity at a second future time after the first future time based on the 
feature data, the first predictive forecast, the plurality of past traits, and the temporal factor. (Wang et al., Figs. 4-6, Par. 61 “The logic flow 600 may adjust a parameter of the deep learning model at block 606. As described herein, the deep learning model may include parameters operative to run a forecasting method for the financial service provider. Some parameters select which filter(s) to apply on each set of feature values in a feature set (e.g., the feature set 260 of FIG. 2) while other parameters select which weights/biases to use when combining filtered feature values into a prediction regarding the number of orders for the sub- 
market. The logic flow 600 may adjust any of the above parameters to improve upon future predictions.”, signifies another prediction being made using the features, temporal data and previous predictions to update the model and make new predictions). 
Wang et al. does not teach the following limitation however Agrawal et al. does. 
and data bagging for training data associated with the features; (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging). 
Wang et al., and Agrawal et al. are both analogous art to the present invention because the references are reasonably pertinent in training and using machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network method of Wang et al., with the bagging method of Agrawal et al. The motivation to so is to enhance learning and optimization of data (Agrawal et al., Par. 162, “Computer 600 improves ensemble meta-learning with optimizations such as boosting and bagging.”). 
Regarding Claim 12, Wang et al. teaches: 
The method of claim 11, wherein the deep neural network model is trained using a long short-term memory (LSTM) recurrent neural network architecture (Wang et al., Par. 46, “A bidirectional Long short term memory (LSTM) is a type of machine learning model implementing an artificial recurrent neural network (RNN) architecture. The bidirectional LSTM is well-suited for classifying or making predictions based upon the time series data of the 
calendar features. In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices 404-1.”). 
with the attention mechanism (Wang et al., Fig. 4 and Par. 51, “The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies a mechanism that identifies importance as accuracy). 
Wang et al. does not teach the following limitation, however Agrawal et al does. 
and the data bagging. (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging). 
Regarding Claim 13, Wang et al. teaches: 
The method of claim 11, wherein the trait comprises a forecasted variable at the one or more future times. (Wang et al. Par. 50, “In general, the Decoder 412 builds a CNN 413 by determining and then, applying parameters to compute an optimal number of orders to execute on a future date.”, signifies a forecasted variable of number of orders on a future date) 
Regarding Claim 14, Wang et al. teaches: 
The method of claim 13, wherein the forecasted variable comprises one of a total payment volume, a future revenue, a purchase amount, or a transaction parameter. (Wang et al. Par. 18, “In some instances, the financial service provider implements region-level forecasting by 
configuring the deep learning model to make predictions as to a number of orders to execute within a specific geographic region. For example, a global bank may want to predict the number of orders at county level. The deep learning model may leverage a temporal correlation between regional markets to accomplish region-level forecasting. The CNN structure allows for temporal alignment of market index features from different regions.”, signifies a number of orders being forecasted which is a transaction parameter). 
Regarding Claim 17, Wang et al. teaches: 
A non-transitory machine-readable medium having stored thereon machine- readable instructions executable to cause a machine to perform operations comprising: (Wang et al. Par. 105 “The computer program product may include a computer readable storage medium ( or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention…A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se”). 
receiving training data for model features of a long short-term memory (LSTM) neural network model, (Wang et al., Fig. 4 and Par. 46, “A bidirectional Long short-term memory (LSTM) is a type of machine learning model implementing an artificial recurrent neural network (RNN) architecture. The bidirectional LSTM is well-suited for classifying or making predictions based upon the time series data of the calendar features. In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices 404-1.”, signifies LSTM model and receiving the data used to later be used for training shown in the figure). 
wherein the training data is associated with a time period and comprises temporal data for the model features over the time period, (Wang et al., Fig. 4 and Par. 47, “The historical allocation 404-3 includes historical order data by the financial service provider based upon at least one of the market indices 404-1. As described herein, the historical order data includes at least a number of orders executed at a particular point-in-time in the above-mentioned time period.”, signifies temporal data used in the forecasting method of the figure as input to be used in training). 
and wherein the LSTM neural network model is configured to enable a predictive forecasting of a future predicted trait; (Wang et al., Fig. 4 and Par. 46, “A bidirectional Long short-term memory (LSTM) is a type of machine learning model implementing an artificial recurrent neural network (RNN) architecture. The bidirectional LSTM is well-suited for classifying or making predictions based upon the time series data of the calendar features.”, signifies enabling forecasting). 
determining additional feature data for additional features used for training the LSTM neural network model for the future predicted trait; (Wang et al., Par. 46, “In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices”, signifies more feature data predicted such as a schedule and weights than just the overall intended number of orders). 
and training the LSTM neural network model using the training data, (Wang et al., Fig. 4 and Par. 46, “A bidirectional Long short-term memory (LSTM) is a type of machine learning model implementing an artificial recurrent neural network (RNN) architecture. The bidirectional 
LSTM is well-suited for classifying or making predictions based upon the time series data of the calendar features. In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices 404-1.”, signifies LSTM model and receiving the data used to later be used for training shown in the Figure). 
the additional feature data, (Wang et al., Fig. 4 and Par. 46, “In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices”, signifies more feature data predicted such as a schedule and weights than just the overall intended number of orders and that data used as input for the forecasting method which includes a training step). 
and an attention mechanism for identifying one or more features for a focus during training of the LSTM neural network model. (Wang et al., Fig. 4 and Par. 51, “The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies a mechanism that identifies importance as accuracy). 
Wang et al. does not teach the following limitations however Agrawal et al. does. 
performing data bagging of data records for the model features in the training data; (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging) the data bagging (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging). 

Regarding Claim 18, Wang et al. teaches: 
for the training the LSTM neural network model. (Wang et al., Fig. 4 and Par. 46, “In some embodiments, based upon the time series data, the bidirectional LSTM predicts a future reconstitution schedule. In some embodiments, the bidirectional LSTM classifies the time series data based upon the calendar features' impact on the weighted averages in the market indices”, signifies more feature data predicted such as a schedule and weights than just the overall intended number of orders and that data used as input for the forecasting method which includes a training step). 
Wang et al. does not teach the following limitations however Agrawal et al. does. 
The non-transitory machine-readable medium of claim 17, wherein the performing the data bagging comprises generating a plurality of data sets of the data records (Agrawal et al., Fig.6 and Par. 168, “Distortions such as variance can be decreased by refactoring an ensemble training dataset, such as tuples 630 , by decomposing the dataset into partially overlapping sub sets of data with a technique known as bootstrap aggregating ( a . k . a . bagging ).”, signifies training through subsets made from bagging) for the training the LSTM neural network model. 
Regarding Claim 19, Wang et al. teaches: 
The non-transitory machine-readable medium of claim 17, wherein the attention mechanism applies one or more weights to the training data for the training the LSTM neural network model for the predictive forecasting of the future predicted trait. (Wang et al., Fig. 4 and Par. 51, “The train 420 component uses the MSE to update the parameters of the CNN 413, thereby improving upon the CNN 413's accuracy.”, signifies a mechanism that identifies importance as accuracy and sets that to parameters that were input from the CNN by the LSTM as shown in the figure). 

Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US PG PUB 2021/0224912 in view of Agrawal et al., US PG PUB 2019/0095756 as applied to claims 11-14 above, and further in view of Dang et al., US PG PUB 2021/0350225. 

Regarding Claim 15, Wang et al. and Agrawal et al. does not teach the following limitation, however Dang et al. does. 
The method of claim 11, wherein the first predictive forecast comprises a vector provided as an input feature for the deep neural network model during the determining the second predictive forecast. (Dang et al. Par. 51, “Given the feature vectors h,={,Ci)} ,~in learned from the observed data”, and Par. 64 “Along with the feature vectors h,, the attention vectors can be inputs to compute the conditional distribution Pr (x,Cdlla,,wA, h,), which can be equivalent to an ensemble of K neural networks and their output values from a distribution.”, signifies a vector learned after training and early predictions and used as an input for the next). 
Wang et al., Agrawal et al., and Dang et al. are all analogous art to the present invention because the references are reasonably pertinent in training and using machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network method of Wang et al., with the bagging method of Agrawal et al. and with the vectors of Dang et al. The motivation to so is to use the vectors as inputs and outputs that describe features with respect to the point in time (Dang et al., Par. 44, “employed by the time series analysis component 110. For instance, the intermediate feature vectors can be denoted by {h,Ci)},~ 1D with respect to D input time series at time point t.”). 

Regarding Claim 16, Wang et al. and Agrawal et al. does not teach the following limitations, however Dang et al. does.The method of claim 11, wherein the determining the plurality of past traits of the entity over the time period comprises determining a plurality of vectors at different past times over the time period, (Dang et al., Par. 44 “instance, the intermediate feature vectors can be denoted by {h,Ci)},~ 1D with respect to D input time series at time point”, and Par. 48, “the attention decoder component 402 can adapt one or more attention mechanisms to the context of the multivariate time series to learn a set of variables {a,Ci)},~ 1D associated with {h,Ci)},~1D for each output d-th time series.”, signifies determining vectors at different times in the series). and wherein each of the plurality of vectors are used as an input when determining a next one of the plurality of past traits by the deep neural network model. (Dang et al., Par. 48 “the attention decoder component 402 can adapt one or more attention mechanisms to the context of the multivariate time series to learn a set of variables {a,Ci)},~ 1D associated with {h,Ci)},~1D for each output d-th time series.”, and Par. 50, “A higher value of a,Ci) can imply more relevant information found in h,Ci). Thus, the d-th time series can be dependent on i-th time series at time point t. Toward learning{ a,c'l} ,~in, the attention decoder component 402 can employ the additive form a,Ci)=var tanh (Waxh,Ci)+ba).”, signifies the time series as dependent on past times). 

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US PG PUB 2021/0224912 in view of Agrawal et al., US PG PUB 2019/0095756 as applied to claims 17-19 above, and further in view of Walters et al. US PG PUB 2021/0117996. 

Regarding Claim 20, Wang et al. and Agrawal et al. does not teach the following limitation, however Walters et al. does. 
The non-transitory machine-readable medium of claim 17, wherein the training data further comprises customer data over the time period associated with customers of a service provider, and wherein the future predicted trait comprises a total payment volume. (Walters et al. Par. 27, “the bill prediction system 106 may perform clustering on service billing data to group customers into groups of customers and train RNNs with bill information for a period of time for at least one of the group of customers. Further, the bill prediction system 106 may predict, with an RNN, amortized bill amounts for customers of the group of customers for a future period of time. The bill prediction system 106 may also notify customers of the future bill amounts. For example, the bill prediction system 106 may cause communication of an indication of an amortized bill amount to a computing device 102 associated with a customer.”, signifies customer bill information over time as training data and payment amount being predicted). 
Wang et al., Agrawal et al., and Walters et al. are all analogous art to the present invention because the references are reasonably pertinent in training and using machine learning models. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the forecasting deep neural network method of Wang et al., with the bagging method of Agrawal et al. and with the customer data and predicted payment amount of Walters et al. The motivation to do so is be able to predict how much money is going to be spent in the future based on the customer’s current patterns. 

Response to Arguments
Applicant's arguments filed 9/4/2025 have been fully considered but they are not persuasive. The Applicant indicates that claims 1, 6, 11, 17-18,  cannot be performed in the mind (pages 8-9). The examiner disagrees, because as indicated above, the limitations of obtaining features of an entity,  and determining the first and second predictive forecast are limitations which can be performed mentally. The mind can obtain data, and make determinations, evaluations, or judgements like the ones recited in the claims, hence the abstract idea rejection remains.
Regarding claim 1, the Applicant states that “nowhere does Wang teach "an attention mechanism." Further, the cited section in Wang describes the training of the CNN, where "[o]ver time, the train 420 component computes a mean squared error (MSE) of the CNN by measuring an average of the squares of the differences between each day's target and prediction."” (page 9). The examiner disagrees, because Wang teaches a component or mechanism to determine, and improve prediction accuracy of the CNN over a period of several days (51).

Additionally, Applicant indicates that “Second, claim 1 recites: "wherein the first predictive forecast is further determined using a plurality of predictions determined by the deep neural network model over the time period from the feature data and the one or more importance levels for the attention mechanism and one or more of the plurality of timesteps associated with the first time;"Applicant notes that the Office failed to map this claim limitation to the cited art…” (page9). The examiner disagrees, because on page 28 the examiner addressed this limitation by pointing to “using predictions to improve accuracy to create an optimized forecast using the importance of accuracy and timesteps of days in this case”.

Moreover, Applicant argues that “The Office alleges that Wang teaches this limitation. See Office Action, p. 29, citing to Wang, 51. … This disclosure in Wang, however, fails to teach the claimed attention mechanism or how the "one or more importance levels for the attention mechanism and one or more of the plurality of timesteps associated with the second time," are used to predict an optimal number of orders. The Office also fails to explain which of the components in Wang map to the claimed "plurality of timesteps," and the "second time," rendering the patentability analysis deficient.”(page10). The examiner disagrees because Wang teaches adjusting the CNN parameters to make additional predictions, using previous filtered parameters, in order to improve the accuracy of the CNN’s daily predictions (fig.4-6, 51, 61).

Regarding claims 11-14 and 17-19, the Applicant argues that “Wang fails to teach any type of an attention mechanism, including the attention mechanism that "focuses on past timesteps of relevance to a future predictive forecast at a corresponding timestep," as recited in the amended claim 11.”(page 10). The examiner disagrees with this argument, because as indicated above in light of the 35 USC 112b indefiniteness rejection, Wang teaches using feature data processed in the convolutional neural network structure using historical data to make future predictions.

Regarding claims 15-16, the Applicant argues that the claims “depend from claim 11, and are patentable over the cited art for the same reasons as claim 11.”(page 11). As indicated above, Wang teaches the claims as it relates to claim 11.

Regarding claim 20, the Applicant argues that the claim “depends from claim 17, and is patentable over the cited art for the same reasons as claim 17”(page 11). The examiner disagrees at least based on the rationale stated above in regards to claim 17.


Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CESAR PAULA whose telephone number is (571)272-4128. The examiner can normally be reached Monday - Friday, 6.30am- 4:30 pm ET. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Wiley can be reached at (571)272-3923. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 



/CESAR B PAULA/Supervisory Patent Examiner, Art Unit 2145
Read full office action
Prosecution Timeline

Feb 08, 2022
Application Filed
May 30, 2025
Non-Final Rejection — §101, §103, §112
Sep 04, 2025
Response Filed
Feb 05, 2026
Final Rejection — §101, §103, §112
Mar 10, 2026
Interview Requested
Mar 21, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

17/706,665
Patent 12596934
PREDICTION-MODEL-BUILDING METHOD, STATE PREDICTION METHOD AND DEVICES THEREOF
2y 5m to grant Granted Apr 07, 2026
17/450,353
Patent 12585982
MODEL MANAGEMENT USING CONTAINERS
2y 5m to grant Granted Mar 24, 2026
19/011,578
Patent 12585859
SYSTEM AND METHOD FOR IMPROVING THE CLARITY OF OVERLAPPING OBJECTS
2y 5m to grant Granted Mar 24, 2026
17/245,892
Patent 12579439
Kernelized Classifiers in Neural Networks
2y 5m to grant Granted Mar 17, 2026
17/699,489
Patent 12554971
METHOD OF PREDICTING CHARACTERISTICS OF SEMICONDUCTOR DEVICE AND COMPUTING DEVICE PERFORMING THE SAME
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
32%
Grant Probability
41%
With Interview (+8.3%)
4y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 169 resolved cases by this examiner. Grant probability derived from career allow rate.