Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/28/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Allowable Subject Matter
Claims 9 and 18 are allowable over the prior art, but are rejected under §101 as presented below.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-20 rejected under 35 U.S.C. 101 because they are directed to an abstract idea without significantly more.
Step 1 analysis:
Independent Claim 1 recites, in part, a computer implemented method, therefore falling into the statutory category of process. Independent Claim 10 recites, in part, an apparatus, therefore falling into the statutory category of machine. Independent Claim 19 recites, in part, a computer program product, therefore falling into the statutory category of manufacture.
Regarding Claim 1:
Step 2A: Prong 1 analysis:
Claim 1 recites in part:
“determining an attention head score based on the one or more temporal feature time points for each temporal feature set within a series of time windows”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses determining a score based on temporal data.
“generating the predictive temporal feature impact report based on one or more determined attention head scores”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses generating a report based on determined scores.
Accordingly, at Step 2A: Prong 1, the claim is directed to an abstract idea.
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“receiving an entity input data object”. This additional elements is recited at a high level of generality and amounts to extra-solution activity of gathering data i.e. pre-solution activity of gathering data for use in the claimed process.
“by communications hardware”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (communication hardware) (See MPEP 2106.05(f)).
“wherein: i) the entity input data object describes one or more temporal feature sets, ii) each temporal feature set includes one or more temporal feature time points, and iii) the one or more temporal feature time points are ordered temporally within the entity input data object”. This limitation merely indicates a field of use or technological environment in which the judicial exception is performed (temporal data) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h).
“by an attention head engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
“by a downstream model engine”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (machine learning model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
The additional element(s) of “receiving an entity input data object” is/are recited at a high level of generality and amount(s) to extra-solution activity of receiving data i.e., pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the additional element(s) of “by communications hardware”, “by an attention head engine and using the FEATS model”, and “by a downstream model engine” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
The additional element(s) of “wherein: i) the entity input data object describes one or more temporal feature sets, ii) each temporal feature set includes one or more temporal feature time points, and iii) the one or more temporal feature time points are ordered temporally within the entity input data object” is/are directed to particular field(s) of use (temporal data) (MPEP 2106.05(h)) and therefore do not provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 2:
Step 2A: Prong 1 analysis:
Claim 2 recites in part:
“determining a per-temporal feature time impact score for each time window associated with the feature attention head”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses determining scores for features within a given time window.
“determining a temporal feature time impact vector based on one or more determined per-temporal feature time impact scores”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses determining vectors based on determined scores.
“determining the attention head score for the feature attention head based on the temporal feature time impact vector”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses determining scores based on determined vectors.
Accordingly, at Step 2A: Prong 1, the claim is directed to an abstract idea.
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“by the attention head engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
As discussed above, the additional element(s) of “by the attention head engine and using the FEATS model” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 3:
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“training a set of trainable parameters of the feature attention head”. This additional elements is recited at a high level of generality such that the claim recites only the idea of a solution or outcome (train a model) i.e., the claim fails to recite details of how a solution to a problem is accomplished.
“by the attention head engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
As discussed above, the additional element(s) of “training a set of trainable parameters of the feature attention head” is/are recited at a high-level of generality such that the claim recites only the idea of a solution or outcome i.e., the claim fails to recite details of how a solution to a problem is accomplished (See MPEP 2106.05(f)).
As discussed above, the additional element(s) of “by the attention head engine and using the FEATS model” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 4:
Step 2A: Prong 1 analysis:
Claim 4 recites in part:
“determining an overall model response based on the one or more determined attention head scores”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses determining a model output based on determined scores.
“wherein the predictive temporal feature impact report is based on the overall model response”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses generating a report that accounts for the model output.
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“by the downstream model engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
As discussed above, the additional element(s) of “by the downstream model engine and using the FEATS model” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 5:
Step 2A: Prong 1 analysis:
Claim 5 recites in part:
“generating one or more static feature vectors based on the one or more temporally static features”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses creating vectors from data.
“determining an overall model response based on the one or more determined attention head scores and the one or more static feature vectors”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses determining a model output based on the model inputs.
“wherein the predictive temporal feature impact report is based on the overall model response”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses generating a report that accounts for the model output.
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“by a temporally static feature engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
“by the downstream model engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
As discussed above, the additional element(s) of “by a temporally static feature engine and using the FEATS model” and “by the downstream model engine and using the FEATS model” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 6:
Step 2A: Prong 1 analysis:
Claim 6 recites in part:
“determining one or more transformed static features by applying one or more transformation functions to each temporally static feature”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses applying transformations to data.
“wherein generating the one or more static feature vectors is based on the one or more transformed static features”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses creating feature vectors from transformed data.
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“by a temporally static feature engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
As discussed above, the additional element(s) of “by a temporally static feature engine and using the FEATS model” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 7:
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“receiving a set of hyperparameters”. This additional elements is recited at a high level of generality and amounts to extra-solution activity of gathering data i.e. pre-solution activity of gathering data for use in the claimed process.
“by the communications hardware”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (communication hardware) (See MPEP 2106.05(f)).
“wherein the set of hyperparameters comprises: a number of feature attention heads to be included in the FEATS model, a number of network layers to be included in each feature attention head, a number of network nodes for each network layer to be included in each feature attention head, an activation function to be included in each feature attention head, a width of a rolling window to be utilized by each feature attention head, a regularization parameter to be utilized by each feature attention head, or a combination thereof”. This limitation merely indicates a field of use or technological environment in which the judicial exception is performed (hyperparameters) and thus fails to add an inventive concept to the claims. See MPEP 2106.05(h).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
The additional element(s) of “receiving a set of hyperparameters” is/are recited at a high level of generality and amount(s) to extra-solution activity of receiving data i.e., pre-solution activity of gathering data for use in the claimed process. The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory").
As discussed above, the additional element(s) of “by the communications hardware” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
The additional element(s) of “wherein the set of hyperparameters comprises: a number of feature attention heads to be included in the FEATS model, a number of network layers to be included in each feature attention head, a number of network nodes for each network layer to be included in each feature attention head, an activation function to be included in each feature attention head, a width of a rolling window to be utilized by each feature attention head, a regularization parameter to be utilized by each feature attention head, or a combination thereof” is/are directed to particular field(s) of use (hyperparameters) (MPEP 2106.05(h)) and therefore do not provide significantly more than the abstract idea, and thus the claim is subject-matter ineligible.
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 8:
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“wherein each feature attention head is configured to attend to a subset of the one or more temporal feature time points of the entity input data object”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception
As discussed above, the additional element(s) of “wherein each feature attention head is configured to attend to a subset of the one or more temporal feature time points of the entity input data object” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 9:
Step 2A: Prong 1 analysis:
Claim 9 recites in part:
“generating one or more variable contribution scores or one or more temporal contribution scores, wherein the one or more variable contribution scores evaluate contributions of different temporal feature time points to the one or more determined attention head scores, wherein the one or more temporal contribution scores evaluate contributions of different temporal feature sets to the one or more determined attention head scores”. As drafted and under its broadest reasonable interpretation, this limitation covers performance of the limitation in the mind (including an observation, evaluation, judgement, or opinion) or with the aid of pencil and paper. For example, this limitation encompasses generating scores based on how influential variables are to the machine learning model.
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2A: Prong 2 analysis:
The judicial exception is not integrated into practical application. In particular, the claim recites the additional elements of:
“by the attention head engine and using the FEATS model”. This additional element is recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component (attention model) (See MPEP 2106.05(f)).
Accordingly at Step 2A: Prong 2, the additional elements individually or in combination do not integrate the judicial exception into a practical application.
Step 2B analysis:
In accordance with Step 2B, the claim does not include additional elements that are sufficient to amount to significantly more that the judicial exception.
As discussed above, the additional element(s) of “by the attention head engine and using the FEATS model” is/are recited at a high-level of generality such that it/they amount(s) to no more than mere instructions to apply the exception using generic computer components (See MPEP 2106.05(f)).
Accordingly, at Step 2B, the additional elements individually or in combination do not amount to significantly more than the judicial exception.
Regarding Claim 10:
Due to claim language similar to that of Claim 1, Claim 10 is rejected for the same reasons as presented above in the rejection of Claim 1.
Regarding Claim 11:
Due to claim language similar to that of Claim 2, Claim 11 is rejected for the same reasons as presented above in the rejection of Claim 2.
Regarding Claim 12:
Due to claim language similar to that of Claim 3, Claim 12 is rejected for the same reasons as presented above in the rejection of Claim 3.
Regarding Claim 13:
Due to claim language similar to that of Claim 4, Claim 13 is rejected for the same reasons as presented above in the rejection of Claim 4.
Regarding Claim 14:
Due to claim language similar to that of Claim 5, Claim 14 is rejected for the same reasons as presented above in the rejection of Claim 5.
Regarding Claim 15:
Due to claim language similar to that of Claim 6, Claim 15 is rejected for the same reasons as presented above in the rejection of Claim 6.
Regarding Claim 16:
Due to claim language similar to that of Claim 7, Claim 16 is rejected for the same reasons as presented above in the rejection of Claim 7.
Regarding Claim 17:
Due to claim language similar to that of Claim 8, Claim 17 is rejected for the same reasons as presented above in the rejection of Claim 8.
Regarding Claim 18:
Due to claim language similar to that of Claim 9, Claim 18 is rejected for the same reasons as presented above in the rejection of Claim 9.
Regarding Claim 19:
Due to claim language similar to that of Claims 1 and 10, Claim 19 is rejected for the same reasons as presented above in the rejection of Claims 1 and 10.
Regarding Claim 20:
Due to claim language similar to that of Claims 2 and 11, Claim 20 is rejected for the same reasons as presented above in the rejection of Claims 2 and 11.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 4-8, 10, 13-17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lim et al (Lim, B., Arik, S. O., Loeff, N., & Pfister, T. (2020). Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. arXiv [Stat.ML]. Retrieved from http://arxiv.org/abs/1912.09363, hereinafter Lim) in view of Calmon et al (US 20170364803 A1, hereinafter Calmon).
Regarding Claim 1:
Lim teaches
A computer-implemented method for generating a predictive temporal feature impact report for an entity using a feature engineering machine with attention for time series (FEATS) model including one or more feature attention heads (Lim [Abstract]: “we introduce the Temporal Fusion Transformer (TFT) – a novel attention based architecture which combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics”; [Figure 2 caption]: “Time-dependent processing is based on LSTMs for local processing, and multi-head attention for integrating information from any time step”; (EN): the model used in this paper is functionally similar to the FEATS model of the instant application),
wherein: i) the entity input data object describes one or more temporal feature sets, ii) each temporal feature set includes one or more temporal feature time points, and iii) the one or more temporal feature time points are ordered temporally within the entity input data object (Lim [Page 9, section 4.3, par. 1]: “Specifically, this includes contexts for (1) temporal variable selection (cs), (2) local processing of temporal features (cc, ch), and (3) enriching of temporal features with static information (ce)”; [Page 17, section 6.6, par. 3]: “A possible explanation is that persistent daily seasonality appears to dominate other temporal relationships in the Electricity dataset. For this dataset, Table B.4 of Appendix B also shows that the hour-of-day has the largest variable importance score across all temporal inputs, exceeding even the target (i.e. Power Usage) itself. In contrast to other dataset where past target observations are more significant (e.g. Traffic), direct attention to previous days seem to help learning daily seasonal patterns in Electricity”; (EN): the datasets used in this paper are all ordered temporally and utilize features that are based on temporal points);
for each feature attention head included in the FEATS model, determining, by an
attention head engine and using the FEATS model, an attention head score based on the one or more temporal feature time points for each temporal feature set within a series of time windows (Lim [Page 17, section 6.6, par. 3]: “For this dataset, Table B.4 of Appendix B also shows that the hour-of-day has the largest variable importance score across all temporal inputs, exceeding even the target (i.e. Power Usage) itself”);
and generating, by a downstream model engine, the predictive temporal feature impact report based on one or more determined attention head scores (Lim [Page 17, section 7.1, par. 1]: “As the retail dataset contains the full set of available input types (i.e. static metadata, known inputs, observed inputs and the target), we present the results for its variable importance analysis in Table 3. We also note similar findings in other datasets, which are documented in Appendix B.1 for completeness. On the whole, the results show that the TFT extracts only a subset of key inputs that intuitively play a significant role in predictions. The analysis of persistent temporal patterns is often key to understanding the time-dependent relationships present in a given dataset.”; [Table 3]: Table 3 shows analytical results of the variable importance for the Retail dataset; [Figure 3 caption]: “Results of ablation analysis. Both a) and b) show the impact of ablation on the P50 and P90 losses respectively. Results per dataset shown on the left, and the range across datasets shown on the right. While the precise importance of each is dataset-specific, all components contribute significantly on the whole”; (EN): applicant’s disclosure describes the downstream model engine in [0093]: “The downstream model 814 may be embodied by different linear regression models, logistic regression models, or other models.”).
Lim does not distinctly disclose
the computer-implemented method comprising: receiving, by communications hardware, an entity input data object
However, Calmon teaches
the computer-implemented method comprising: receiving, by communications hardware, an entity input data object (Calmon [0021]: “For example, the forecast algorithm 116 uses the second statistical prediction model 114 together with new behavioral data 124 and new external data 125 (i.e., received over the network 106) to predict a value of the target time series for a predetermined time period”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Lim and Calmon before him or her, to modify the system for an attention based architecture with multi-horizon forecasting with insights into temporal dynamics of Lim to include the methods of calculating a future behavioral data and identifying a relative causal impact of external factors affecting the data as shown in Calmon. The motivation for doing so would have been to use the second prediction model of Soliman in order to operate on newly received data to make a prediction over the time series (Calmon [0021]: “For example, the forecast algorithm 116 uses the second statistical prediction model 114 together with new behavioral data 124 and new external data 125 (i.e., received over the network 106) to predict a value of the target time series for a predetermined time period”).
Regarding Claim 4:
Lim teaches
The computer-implemented method of claim 1, further comprising: determining, by the downstream model engine and using the FEATS model, an overall model response based on the one or more determined attention head scores (Lim [Page 9, section 4.4, par. 2]: “To improve the learning capacity of the standard attention mechanism, multi-head attention is proposed in [17], employing different heads for different representation subspaces: {Eqn. 11}, {Eqn. 12} where WK(h), WQ(h), WV(h) are head-specific weights for keys, queries and values, and WH linearly combines outputs concatenated from all heads Hh. Given that different values are used in each head, attention weights alone would not be indicative of a particular feature’s importance. As such, we modify multi-head attention to share values in each head, and employ additive aggregation of all heads: {Eqn. 13}, {Eqn. 14}, {Eqn. 15}, {Eqn. 16}”; (EN): applicant’s disclosure describes the downstream model engine in [0093]: “The downstream model 814 may be embodied by different linear regression models, logistic regression models, or other models.”);
wherein the predictive temporal feature impact report is based on the overall model response (Lim [Page 17, section 7.1, par. 1]: “As the retail dataset contains the full set of available input types (i.e. static metadata, known inputs, observed inputs and the target), we present the results for its variable importance analysis in Table 3. We also note similar findings in other datasets, which are documented in Appendix B.1 for completeness. On the whole, the results show that the TFT extracts only a subset of key inputs that intuitively play a significant role in predictions. The analysis of persistent temporal patterns is often key to understanding the time-dependent relationships present in a given dataset.”; [Table 3]: Table 3 shows analytical results of the variable importance for the Retail dataset; [Figure 3 caption]: “Results of ablation analysis. Both a) and b) show the impact of ablation on the P50 and P90 losses respectively. Results per dataset shown on the left, and the range across datasets shown on the right. While the precise importance of each is dataset-specific, all components contribute significantly on the whole”).
Regarding Claim 5:
Lim teaches
The computer-implemented method of claim 1, wherein the entity input data object further describes one or more temporally static features, and the computer-implemented method further comprises: generating, by a temporally static feature engine and using the FEATS model, one or more static feature vectors based on the one or more temporally static features (Lim [Page 3, section 1, par. 5]: “we introduce multiple novel ideas to align the architecture with the full range of potential inputs and temporal relationships common to multi-horizon forecasting – specifically incorporating (1) static covariate encoders which encode context vectors for use in other parts of the network”);
determining, by the downstream model engine and using the FEATS model, an overall model response based on the one or more determined attention head scores and the one or more static feature vectors (Lim [Page 9, section 4.4, par. 2]: “To improve the learning capacity of the standard attention mechanism, multi-head attention is proposed in [17], employing different heads for different representation subspaces: {Eqn. 11}, {Eqn. 12} where WK(h), WQ(h), WV(h) are head-specific weights for keys, queries and values, and WH linearly combines outputs concatenated from all heads Hh. Given that different values are used in each head, attention weights alone would not be indicative of a particular feature’s importance. As such, we modify multi-head attention to share values in each head, and employ additive aggregation of all heads: {Eqn. 13}, {Eqn. 14}, {Eqn. 15}, {Eqn. 16}”; [Page 4, section 2, par. 4]: “TFT alleviates this by using separate encoder-decoder attention for static features at each time step on top of the self-attention to determine the contribution time-varying inputs.”)
wherein the predictive temporal feature impact report is based on the overall model response (Lim [Page 17, section 7.1, par. 1]: “As the retail dataset contains the full set of available input types (i.e. static metadata, known inputs, observed inputs and the target), we present the results for its variable importance analysis in Table 3. We also note similar findings in other datasets, which are documented in Appendix B.1 for completeness. On the whole, the results show that the TFT extracts only a subset of key inputs that intuitively play a significant role in predictions. The analysis of persistent temporal patterns is often key to understanding the time-dependent relationships present in a given dataset.”; [Table 3]: Table 3 shows analytical results of the variable importance for the Retail dataset; [Figure 3 caption]: “Results of ablation analysis. Both a) and b) show the impact of ablation on the P50 and P90 losses respectively. Results per dataset shown on the left, and the range across datasets shown on the right. While the precise importance of each is dataset-specific, all components contribute significantly on the whole”).
Regarding Claim 6:
Lim teaches
The computer-implemented method of claim 5, wherein the computer-implemented method further comprises: determining, by the temporally static feature engine and using the FEATS model, one or more transformed static features by applying one or more transformation functions to each temporally static feature (Lim [Page 16, section 6.6, par. 3]: “We ablate by setting all context vectors to zero – i.e. cs=ce=cc=ch=0 – and concatenating all transformed static inputs to all time-dependent past and future inputs”)
wherein generating the one or more static feature vectors is based on the one or more transformed static features (Lim [Page 16, section 6.6, par. 3]: “We ablate by setting all context vectors to zero – i.e. cs=ce=cc=ch=0 – and concatenating all transformed static inputs to all time-dependent past and future inputs”)
Regarding Claim 7:
Lim teaches
The computer-implemented method of claim 1, further comprising: receiving, by the communications hardware, a set of hyperparameters, wherein the set of
hyperparameters comprises: a number of feature attention heads to be included in the FEATS model, a number of network layers to be included in each feature attention head, a number of network nodes for each network layer to be included in each feature
attention head, an activation function to be included in each feature attention head,
a width of a rolling window to be utilized by each feature attention head, a regularization parameter to be utilized by each feature attention head, or a combination thereof (Lim [Page 13, section 6.2, par. 1]: “For each dataset, we partition all time series into 3 parts – a training set for learning, a validation set for hyperparameter tuning, and a hold-out test set for performance evaluation. Hyperparameter optimization is conducted via random search, using 240 iterations for Volatility, and 60 iterations for others”; [Page 9, section 4.4, par. 2]: “To improve the learning capacity of the standard attention mechanism, multi-head attention is proposed in [17], employing different heads for different representation subspaces: {Eqn. 11}, {Eqn. 12} where WK(h), WQ(h), WV(h) are head-specific weights for keys, queries and values, and WH linearly combines outputs concatenated from all heads Hh”; [Page 7, section 4.1, par. 1]: “we propose Gated Residual Network (GRN) as shown in in Fig. 2 as a building block of TFT. The GRN takes in a primary input a and an optional context vector c and yields: {Eqn. 2}, {Eqn. 3}, {Eqn. 4} where ELU is the Exponential Linear Unit activation function”).
Regarding Claim 8:
Lim teaches
The computer-implemented method of claim 1, wherein each feature attention head is configured to attend to a subset of the one or more temporal feature time points of the entity input data object (Lim [Figure 2 caption]: “Time-dependent processing is based on LSTMs for local processing, and multi-head attention for integrating information from any time step”).
Regarding Claim 10:
Due to claim language similar to that of Claim 1, Claim 10 is rejected for the same reasons as presented above in the rejection of Claim 1.
Regarding Claim 13:
Due to claim language similar to that of Claim 4, Claim 13 is rejected for the same reasons as presented above in the rejection of Claim 4.
Regarding Claim 14:
Due to claim language similar to that of Claim 5, Claim 14 is rejected for the same reasons as presented above in the rejection of Claim 5.
Regarding Claim 15:
Due to claim language similar to that of Claim 6, Claim 15 is rejected for the same reasons as presented above in the rejection of Claim 6.
Regarding Claim 16:
Due to claim language similar to that of Claim 7, Claim 16 is rejected for the same reasons as presented above in the rejection of Claim 7.
Regarding Claim 17:
Due to claim language similar to that of Claim 8, Claim 17 is rejected for the same reasons as presented above in the rejection of Claim 8.
Regarding Claim 19:
Due to claim language similar to that of Claims 1 and 10, Claim 19 is rejected for the same reasons as presented above in the rejection of Claims 1 and 10.
Claim Rejections - 35 USC § 103
Claim(s) 2, 3, 11, 12, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lim and Calmon as applied to claims 1, 10, and 19 above, and further in view of Zhu et al (US 20200410355 A1, hereinafter Zhu).
Regarding Claim 2:
Lim teaches
determining, by the attention head engine and using the FEATS model, the attention head score for the feature attention head (Lim [Page 17, section 6.6, par. 3]: “For this dataset, Table B.4 of Appendix B also shows that the hour-of-day has the largest variable importance score across all temporal inputs, exceeding even the target (i.e. Power Usage) itself”);
Lim does not distinctly disclose
The computer-implemented method of claim 1, wherein determining the attention head score for a feature attention head comprises: determining, by the attention head engine and using the FEATS model, a per-temporal feature time impact score for each time window associated with the feature attention head;
However, Calmon teaches
The computer-implemented method of claim 1, wherein determining the attention head score for a feature attention head comprises: determining, by the attention head engine and using the FEATS model, a per-temporal feature time impact score for each time window associated with the feature attention head (Calmon [0016]: “For example, the system described herein assumes that each observation of each external factor considered by a prediction engine can be expressed as having an additive relationship with the expected forecast. Additionally, the relative causal impact, which may be represented as a score, is provided regarding the effect of each external factor for each forecasted value”);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Lim and Calmon before him or her, to modify the system for an attention based architecture with multi-horizon forecasting with insights into temporal dynamics of Lim to include the methods of calculating a future behavioral data and identifying a relative causal impact of external factors affecting the data as shown in Calmon. The motivation for doing so would have been to use the second prediction model of Soliman in order to operate on newly received data to make a prediction over the time series (Calmon [0021]: “For example, the forecast algorithm 116 uses the second statistical prediction model 114 together with new behavioral data 124 and new external data 125 (i.e., received over the network 106) to predict a value of the target time series for a predetermined time period”).
Lim + Calmon does not distinctly disclose
determining, by the attention head engine and using the FEATS model, a temporal feature time impact vector based on one or more determined per-temporal feature time impact scores;
based on the temporal feature time impact vector
However, Zhu teaches
determining, by the attention head engine and using the FEATS model, a temporal feature time impact vector based on one or more determined per-temporal feature time impact scores (Zhu [0045]: “Using this loss function in the training, the different modalities and different impact on an observation can be taken into account during the training to learn the attention vectors and to construct the model”);
based on the temporal feature time impact vector (Zhu [0045]: “Using this loss function in the training, the different modalities and different impact on an observation can be taken into account during the training to learn the attention vectors and to construct the model”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Lim + Calmon and Zhu before him or her, to modify the system for an attention based architecture with multi-horizon forecasting with insights into temporal dynamics of Lim + Calmon to include the methods of explainable machine learning as shown in Zhu. The motivation for doing so would have been to use the impact vectors of Zhu in order to provide structures for heterogeneous data to be operated on by the attention heads of Lim + Calmon (Zhu [0021]: “financial time series analysis can provide for optimizing investment decision and hedging market risks. This can be a challenging task as the problems can be accompanied by dual-level (e.g., data-level and task-level) heterogeneity. For instance, in stock price forecasting, a successful portfolio with bounded risks can include a large number of stocks from diverse domains (e.g., utility, information technology, healthcare, etc.), and forecasting stocks in each domain can be treated as one task; within a portfolio, each stock can be characterized by temporal data collected from multiple modalities (e.g., finance, weather, and news), which corresponds to the data-level heterogeneity.”).
Regarding Claim 3:
Lim does not distinctly disclose
The computer-implemented method of claim 2, wherein determining the attention head score for a feature attention head further comprises: training, by the attention head engine and using the FEATS model, a set of trainable parameters of the feature attention head.
However, Calmon teaches
The computer-implemented method of claim 2, wherein determining the attention head score for a feature attention head further comprises: training, by the attention head engine and using the FEATS model, a set of trainable parameters of the feature attention head (Calmon [0045]: “To use the data set 310 to train the neural attention network, the system 100 can define various parameters and constraints”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, and having the teachings of Lim and Calmon before him or her, to modify the system for an attention based architecture with multi-horizon forecasting with insights into temporal dynamics of Lim to include the methods of calculating a future behavioral data and identifying a relative causal impact of external factors affecting the data as shown in Calmon. The motivation for doing so would have been to use the second prediction model of Soliman in order to operate on newly received data to make a prediction over the time series (Calmon [0021]: “For example, the forecast algorithm 116 uses the second statistical prediction model 114 together with new behavioral data 124 and new external data 125 (i.e., received over the network 106) to predict a value of the target time series for a predetermined time period”).
Regarding Claim 11:
Due to claim language similar to that of Claim 2, Claim 11 is rejected for the same reasons as presented above in the rejection of Claim 2.
Regarding Claim 12:
Due to claim language similar to that of Claim 3, Claim 12 is rejected for the same reasons as presented above in the rejection of Claim 3.
Regarding Claim 20:
Due to claim language similar to that of Claims 2 and 11, Claim 20 is rejected for the same reasons as presented above in the rejection of Claims 2 and 11.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 11861317 B1 – Ensemble-based Machine Learning Characterization Of Human-machine Dialog
Marília Barandas, Duarte Folgado, Letícia Fernandes, Sara Santos, Mariana Abreu, Patrícia Bota, Hui Liu, Tanja Schultz, Hugo Gamboa, TSFEL: Time Series Feature Extraction Library, SoftwareX, Volume 11, 2020, 100456, ISSN 2352-7110, https://doi.org/10.1016/j.softx.2020.100456. – a Python package entitled Time Series Feature Extraction Library (TSFEL), which computes over 60 different features extracted across temporal, statistical and spectral domains.
Fan, C., Zhang, Y., Pan, Y., Li, X., Zhang, C., Yuan, R., … Huang, H. (2019). Multi-Horizon Time Series Forecasting with Temporal Attention Learning. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2527–2535. Presented at the Anchorage, AK, USA. doi:10.1145/3292500.3330662 – a novel data-driven approach for solving multi-horizon probabilistic forecasting tasks that predicts the full distribution of a time series on future horizons
Any inquiry concerning this communication or earlier communications from the examiner should be directed to COREY M SACKALOSKY whose telephone number is (703)756-1590. The examiner can normally be reached M-F 7:30am-3:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at (571) 272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/COREY M SACKALOSKY/Examiner, Art Unit 2128
/VINCENT GONZALES/Primary Examiner, Art Unit 2124