Office Action Analysis: 18155471 — METHOD AND APPARATUS FOR PREDICTING FUTURE STATE AND RELIABILITY BASED ON TIME SERIES DATA

Office Action

§101 §102 §103
DETAILED ACTION
This action is responsive to the application filed on 01/17/2023. Claims 1-20 are pending and have been examined.
This action is Non-final.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C.
120, 121, 365(c), or 386(c) is acknowledged.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition
of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the
conditions and requirements of this title. 
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1,
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
 Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“A method of predicting a future state and reliability based on time series data, comprising:…predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data.” -- The limitation is directed predicting states through executed algorithms based on a trained model and past data. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgement, and thus the limitation is directed to a mental process. 
“preprocessing past state data…preprocessing current state data; ” – This limitation is directed to manipulating and conditioning data values according to prescribed rules (ex. removing outliers, filling missing values, computing sequence length, etc., as described in the specification [0012]). Such data conditioning is a form of mathematical concept, because it involves mathematical relationships and calculations performed on data, and thus the limitation is directed to math. 
Step 2A – Prong 2 and Step 2B  – Does the claim recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the abstract idea itself?
“creating a trained model through execution of an algorithm based on the preprocessed past state data;” -- The limitation recites creating a model based on executing an algorithm that is based on gathered state data. The limitation does not integrate to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B,  executing an algorithm based on data to create a trained model is a well-understood, routine, and conventional activity (WURC), which does not provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 1 is non-patent eligible. Claim 11 has analogous claim limitations to claim 1 aside from the claim type, and thus the rejection above applies to both. Majority of claim 20 is analogous to claim 1 and 11, aside from one limitation, see below: 
	“a transceiver transmitting and receiving past data and current data to and from an external device;”-- This limitation recites a transceiver transmitting/receiving gathered and current data from an external device. The limitation is an additional elements under Step 2A Prong 2 and Step 2B as an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of transmitting/receiving data over a network/device is a well-understood, routine, and conventional activity (WURC) that cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). Aside from this limitation, the same rejection can be applied. 

Regarding claim 2, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
 Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method according to claim 1, wherein the step of preprocessing past state data comprises: removing outliers from the past data; and calculating a time series length of the past data.” -- The limitation is directed to the step of the preprocessing of the state data will comprise of removing outliers from data, and then calculating a time series length of the past data. The limitation is directed to the use of a mathematical concept and calculations, and thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 2 is non-patent eligible. Claim 12 has analogous claim limitations to claim 2 aside from the claim type and applying the time series to a processing unit (mere instructions to apply onto a computer). The analogous limitations also are merely just applied to a computer (the device and length calculator) which are additional elements that are not practical applications/significantly more than the judicial exception (see MPEP 2106.05(f)). 

Regarding claim 3, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
 Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“creating a trained model by reflecting instability in the created structure of the trained model.” -- The limitation is directed to creating a trained model by reflecting in stability in the trained model structure that was created. Finding instability in a model’s structure is a process that can be performed in the human mind using evaluation, observation, and judgement, and thus the limitation is directed to a mental process. 
Step 2A – Prong 2 and Step 2B  – Does the claim recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the abstract idea itself?
“The method according to claim 1, wherein the step of creating a trained model comprises: adjusting a training direction of the trained model based on the preprocessed past state data; creating a structure of the trained model based on at least one of the number of training repetition times, a model size, and an algorithm;” -- The limitation recites the steps of creating a trained model that comprises adjusting a training direction of a model based on past, gathered data, and creating a model based on repetitive training times, a size, and an algorithm. The limitation is directed to an insignificant, extra-solution activity that cannot be integrated to a practical application (see MPEP 2106.05(g)). Furthermore, under Step 2B, the act of adjusting training based of past gathered data and repetitive training is a well-understood, routine, and conventional activity (WURC), and thus it cannot provide significantly more than the judicial exception (see MPEP 2106.05(d)(II)). 
Thus, claim 3 is non-patent eligible. Claim 13 has analogous claim limitations to claim 3 aside from the claim type, and thus the rejection above applies to both. 

Regarding claim 4, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“the trained model is created using at least one of percentage instability and instability with reference to a specific critical point.” --The limitation is directed to the trained model being created using a percentage of instability and reference to a specific critical point, which is calculated. The limitation is directed to a use of a mathematical relationship/calculation, and thus the limitation is directed to math. 
Step 2A – Prong 2 and Step 2B  – Does the claim recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the abstract idea itself?
“wherein, in generation of the trained model by reflecting instability in the created structure of the trained model,” --The limitation recites simple further limits to how the instability/reflected instability is represented in the trained model, and it does not integrate to a practical application, nor does it provide significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 4 is non-patent eligible. Claim 14 is analogous to claim 4, aside from claim type, and thus the same rejection above can be applied to both claims.

Regarding claim 5, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“calculating instability in the course of predicting the future state based on the time series feature of the preprocessed past state data.” -- The limitation is directed to calculating instability based on time series of preprocessed state data. The limitation is directed to use of mathematical calculation/concepts, and thus the limitation is directed to math. 
Step 2A – Prong 2 and Step 2B  – Does the claim recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the abstract idea itself?
“The method according to claim 1, wherein the step of predicting a future state through execution of an algorithm comprises: creating a structure of the trained model; executing an algorithm reflecting a time series feature of the preprocessed past state data in the trained model; processing a prediction point-in-time of the preprocessed past state data; applying a suitable feature to the preprocessed past state data by modeling an environment condition feature; and” -- The limitation recites that the steps of predicting a future sate once algorithm is executed comprises creating a structure of the trained model and then executing an algorithm that reflect a time series feature of past state data, then applying a feature on paste data by modeling it. The limitation amounts to no more than mere instructions to apply onto a computer, and does not integrate to a practical application, nor provides significantly more than the judicial exception (see MPEP 2106(f)).  
Thus, claim 5 is non-patent eligible. Claim 15 is analogous to claim 5, aside from claim type, and thus the same rejection above can be applied to both claims.

Regarding claim 6, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“creating a complexity distribution of the preprocessed past state data; creating a complexity distribution sample based on the created complexity distribution; and creating future state data based on the created complexity distribution sampling.” -- The limitation recites creating complexity distribution samples based on past distribution, then same for future state data. The limitation is directed to the use a mathematical calculation/operations, and thus the limitation is directed to math. 
Step 2A – Prong 2 and Step 2B  – Does the claim recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the abstract idea itself?
“The method according to claim 5, further comprising: applying the suitable feature to the preprocessed past state data by modeling the environment condition feature;” -- The limitation recited the method will further comprise applying a suitable feature to the preprocessed past state data by modeling the feature, and it is merely limiting the field of use of the claim, which is not integrating to a practical application, nor provides significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 6 is non-patent eligible. Claim 16 is analogous to claim 6, aside from claim type, and thus the same rejection above can be applied to both claims.

Regarding claim 7, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method according to claim 5, wherein the step of processing a prediction point-in-time of the preprocessed past state data comprises: training a function for estimation of a variation rate of input data through deep learning; and calculating a variation estimation function depending on the prediction point-in-time using the function.” -- The limitation is directed to calculating a variation est. function depending on a predicted point-in time using a function. The limitation is directed to the use of mathematical concept/operation/calculation, as well as recites a process that can be performed in the human mind using evaluation, observation, and judgement, with aid of paper, thus the limitation can be directed to both mental process AND math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 7 is non-patent eligible. Claim 17 is analogous to claim 7, aside from claim type, and thus the same rejection above can be applied to both claims.

Regarding claim 8, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“The method according to claim 5, wherein the step of calculating instability comprises: calculating instability using a weighted sum corresponding to at least one of time series instability, point-in-time instability and distribution complexity instability through deep learning.” -- The limitation is directed to calculating instability using a weighted sum that corresponds to instabilities and distribution complexity through deep learning. The limitation is directed to the use of a mathematical concept/operation, and thus the limitation is directed to math. 
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B. 
Thus, claim 8 is non-patent eligible. Claim 18 is analogous to claim 8, aside from claim type, and thus the same rejection above can be applied to both claims. 

Regarding claim 9, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
“ The method according to claim 1, wherein the step of predicting a future state comprises: calculating a prediction point-in-time by receiving a future point-in-time that a user wants to know; and predicting a future state of the user through execution of an algorithm based on the prediction point-in-time and the trained model.” -- The limitation is directed to steps to predict a future state b either calculating a prediction point-in-time once a user-requested point-in-time is received in the future/through the execution of an algorithm. The limitation is directed to a process that can be performed in the human mind using evaluation, observation, and judgment, with aid of paper, thus the limitation can be directed to mental process.
There are no elements to be evaluated under Step 2A Prong 2 and Step 2B.
Thus, claim 9 is non-patent eligible. Claim 19 is analogous to claim 9, aside from claim type, and thus the same rejection above can be applied to both claims. 

Regarding claim 10, 
Step 1 – Is the claim to a process, machine, manufacture, or composition of matter?
Yes, the claim is directed to a method, which falls under the category of process. The claim satisfies Step 1.
There are no elements to be evaluated under Step 2A Prong 1.
Step 2A – Prong 2 and Step 2B  – Does the claim recite additional elements that integrate the judicial exception into a practical application and amount to significantly more than the abstract idea itself?
“The method according to claim 9, wherein the step of predicting the future state comprises: calculating at least one of reliability of the future state, a prediction basis of the future state, and instability.” -- The limitation recites predicting the future state will comprise now reliability, a prediction basis of the future state, and instability. The limitation merely limits to a field of use/environment, and thus it does not integrate to a practical application, nor provides significantly more than the judicial exception (see MPEP 2106.05(h)). 
Thus, claim 10 is non-patent eligible.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections
under this section made in this Office action:
A person shall be entitled to a patent unless —
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise
available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in patent issued, under section 151, or in an application for patent
published or deemed published under section 122(b), in which the patent or application as the case may be, names
another inventor and was effectively filed before the effective filing date of the claimed invention. 
Claim(s) 1, 3, 4, 11, 13, 14, 20 are rejected under 35 U.S.C. 102(a)(1)) as being anticipated by NPL reference “Deep and Confident Prediction for Time Series at Uber”, by Zhu et. al. (referred herein as Zhu).

Regarding claim 1, Zhu teaches:
 A method of predicting a future state and reliability based on time series data, ([Zhu, page 1, Abstract] “we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation”, wherein the examiner interprets “time series prediction along with uncertainty estimation” to be the same as predicting a future state and reliability based on time series data because they are both directed to forecasting future values from sequential data while simultaneously quantifying the confidence or trustworthiness of that prediction.)
comprising: preprocessing past state data; ([Zhu, page 5, sec 4.1.1] “The raw data are log-transformed to alleviate exponential effects. Next, within each sliding window, the first day is subtracted from all values, so that trends are removed and the neural network is trained for the incremental value”, wherein the examiner interprets “raw data are log-transformed” and “trends are removed” to be the same as preprocessing past state data because they are both directed to transforming and preparing historical time series observations before model training.)
creating a trained model through execution of an algorithm based on the preprocessed past state data; ([Zhu, page 4, sec 3.2.1] “Prior to fitting the prediction model, we first conduct a pre-training step to fit an encoder that can extract useful and representative embeddings from a time series” AND [Zhu, page 5, sec 3.2.2] “a prediction network is trained to forecast the next one or more timestamps using the learned embedding as features”, wherein the examiner interprets “pre-training step to fit an encoder” and “prediction network is trained to forecast” to be the same as creating a trained model through execution of an algorithm based on the preprocessed past state data because they are both directed to training a neural network model using processed historical data to enable future predictions.)
preprocessing current state data; and ([Zhu, page 5, sec 4.1.1] “At test time, it is straightforward to revert these transformations to obtain predictions at the original scale”, wherein the examiner interprets “At test time” with reference to applying transformations to be the same as preprocessing current state data because they are both directed to applying the same data transformations to new incoming observations before inference.)
predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data. ([Zhu, page 5, sec 3.2.4, 4.1.1] “After the full model is trained, the inference stage involves only the encoder and the prediction network … each sliding window contains the previous 28 days as input, and aims to forecast the upcoming day”, wherein the examiner interprets “inference stage involves only the encoder and the prediction network” and “sliding window contains the previous 28 days as input, and aims to forecast the upcoming day” to be the same as predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data because they are both directed to using a trained model to process current input that incorporates historical context to generate a forecast of future values.)

Regarding claim 3, Zhu teaches:
The method according to claim 1, wherein the step of creating a trained model comprises: adjusting a training direction of the trained model based on the preprocessed past state data; ([Zhu, page 3, sec 3.1] “Given a set of N observations X = {x1, ..., xN } and Y = {y1, ..., yN }, Bayesian inference aims at finding the posterior distribution over model parameters p(W | X, Y )”, wherein the examiner interprets “Bayesian inference aims at finding the posterior distribution over model parameters” to be the same as adjusting a training direction of the trained model based on the preprocessed past state data because they are both directed to updating and optimizing model parameters based on the training observations.)
creating a structure of the trained model based on at least one of the number of training repetition times, a model size, and an algorithm; and ([Zhu, page 5, sec 4.1.1] “The encoder-decoder is constructed with two-layer LSTM cells, with 128 and 32 hidden states, respectively. The prediction network has three fully connected layers with tanh activation, with 128, 64, and 16 hidden units, respectively”, wherein the examiner interprets “two-layer LSTM cells, with 128 and 32 hidden states” and “three fully connected layers with tanh activation, with 128, 64, and 16 hidden units” to be the same as creating a structure of the trained model based on at least one of the number of training repetition times, a model size, and an algorithm because they are both directed to defining the architecture and size parameters of the neural network model.)
creating a trained model by reflecting instability in the created structure of the trained model. ([Zhu, page 1, sec 1] “Under this framework, the prediction uncertainty can be decomposed into three types: model uncertainty, inherent noise, and model misspecification” AND [Zhu, page 2, sec 2.2] “stochastic dropouts are applied after each hidden layer, and the model output can be approximately viewed as a random sample generated from the posterior predictive distribution”, wherein the examiner interprets “prediction uncertainty can be decomposed into three types: model uncertainty, inherent noise, and model misspecification” and “stochastic dropouts are applied after each hidden layer” to be the same as creating a trained model by reflecting instability in the created structure of the trained model because they are both directed to incorporating uncertainty quantification mechanisms into the model architecture to capture prediction variability.)

Regarding claim 4, Zhu teaches:
 The method according to claim 3, wherein, in generation of the trained model by reflecting instability in the created structure of the trained model, ([Zhu, page 4, sec 3.1.2] “Here, we take a principled approach by connecting the encoder, g(·), with a prediction network, h(·), and treat them as one large network f = h(g(·)) during inference” AND [Zhu, page 4, sec 3.1.2] “During this feedforward pass, MC dropout is applied to all layers in both the encoder g and the prediction network h”, wherein the examiner interprets “treat them as one large network” and “MC dropout is applied to all layers in both the encoder g and the prediction network h” to be the same as in generation of the trained model by reflecting instability in the created structure of the trained model because they are both directed to building a model architecture that inherently incorporates uncertainty estimation mechanisms within its structural design.)
the trained model is created using at least one of percentage instability ([Zhu, page 3, sec 3] “Specifically, we would like to quantify the prediction standard error, η, so that an approximate α-level prediction interval can be constructed by [ŷ* − zα/2η, ŷ* + zα/2η]” AND [Zhu, page 6, sec 4.1.3] “Table 2 reports the empirical coverage of the 95% predictive intervals”, wherein the examiner interprets “α-level prediction interval” and “empirical coverage of the 95% predictive intervals” to be the same as percentage instability because they are both directed to expressing uncertainty as a percentage-based confidence level that quantifies the range within which predictions are expected to fall.)
and instability with reference to a specific critical point. ([Zhu, page 6, sec 4.1.3] “One important use-case of the uncertainty estimation is to provide insight for unusual patterns in the time series” AND [Zhu, page 7, sec 4.2] “A natural approach is to trigger an alarm when the observed value falls outside of the 95% predictive interval”, wherein the examiner interprets “trigger an alarm when the observed value falls outside of the 95% predictive interval” to be the same as instability with reference to a specific critical point because they are both directed to identifying when values exceed a defined threshold or boundary that serves as a reference point for determining abnormal or unstable conditions.)

Regarding claim 11, Zhu teaches:
An apparatus for predicting a future state and reliability based on time series data, comprising: ([Zhu, page 1, Abstract] “we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation”, wherein the examiner interprets “end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation” to be the same as an apparatus for predicting a future state and reliability based on time series data because they are both directed to a system that forecasts future values from sequential data while simultaneously quantifying the confidence or trustworthiness of that prediction.)
a time series feature preprocessing unit preprocessing at least one of past state data and current state data; ([Zhu, page 5, sec 4.1.1] “The raw data are log-transformed to alleviate exponential effects. Next, within each sliding window, the first day is subtracted from all values, so that trends are removed and the neural network is trained for the incremental value”, wherein the examiner interprets “raw data are log-transformed” and “trends are removed” to be the same as a time series feature preprocessing unit preprocessing at least one of past state data and current state data because they are both directed to a processing component that transforms and prepares time series observations before model training and inference.)
a future state prediction model-training unit creating a trained model through execution of an algorithm based on the preprocessed past state data; ([Zhu, page 4, sec 3.2.1] “Prior to fitting the prediction model, we first conduct a pre-training step to fit an encoder that can extract useful and representative embeddings from a time series” AND [Zhu, page 5, sec 3.2.2] “a prediction network is trained to forecast the next one or more timestamps using the learned embedding as features”, wherein the examiner interprets “pre-training step to fit an encoder” and “prediction network is trained to forecast” to be the same as a future state prediction model-training unit creating a trained model through execution of an algorithm based on the preprocessed past state data because they are both directed to a training component that builds a neural network model using processed historical data to enable future predictions.)
and a future state prediction unit predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data. ([Zhu, page 5, sec 3.2.3] “After the full model is trained, the inference stage involves only the encoder and the prediction network” AND [Zhu, page 5, sec 4.1.1] “each sliding window contains the previous 28 days as input, and aims to forecast the upcoming day”, wherein the examiner interprets “inference stage involves only the encoder and the prediction network” and “sliding window contains the previous 28 days as input, and aims to forecast the upcoming day” to be the same as a future state prediction unit predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data because they are both directed to a prediction component that uses a trained model to process current input incorporating historical context to generate a forecast of future values.)

Regarding claim 13, Zhu teaches The apparatus according to claim 11, wherein the future state prediction model-training unit comprises: ([Zhu, page 4, sec 3.2] “The complete architecture of the neural network is shown in Figure 1. The network contains two major components: (i) an encoder-decoder framework that captures the inherent pattern in the time series, which is learned during pre-training step, and (ii) a prediction network”, wherein the examiner interprets “complete architecture of the neural network” and “network contains two major components” to be the same as the future state prediction model-training unit comprises because they are both directed to a system comprising multiple functional components for training a prediction model.)
a multiple points-in-time generator adjusting a training direction of the trained model based on the preprocessed past state data; ([Zhu, page 5, sec 4.1.1 ] “Samples are constructed using a sliding window with step size one, where each sliding window contains the previous 28 days as input” AND [Zhu, page 3, sec 3.1] “Given a set of N observations X = {x1, ..., xN } and Y = {y1, ..., yN }, Bayesian inference aims at finding the posterior distribution over model parameters p(W | X, Y)”, wherein the examiner interprets “Samples are constructed using a sliding window” to be the same as a multiple points-in-time generator because they are both directed to generating multiple temporal samples from the time series, and “Bayesian inference aims at finding the posterior distribution over model parameters” to be the same as adjusting a training direction of the trained model based on the preprocessed past state data because they are both directed to updating and optimizing model parameters based on the training observations.)
a time series model-training unit creating a structure of the trained model based on at least one of the number of training repetition times, a model size, and an algorithm; and ([Zhu, page 5, sec 4.1.1] “The encoder-decoder is constructed with two-layer LSTM cells, with 128 and 32 hidden states, respectively. The prediction network has three fully connected layers with tanh activation, with 128, 64, and 16 hidden units, respectively”, wherein the examiner interprets “two-layer LSTM cells, with 128 and 32 hidden states” and “three fully connected layers with tanh activation, with 128, 64, and 16 hidden units” to be the same as a time series model-training unit creating a structure of the trained model based on at least one of the number of training repetition times, a model size, and an algorithm because they are both directed to defining the architecture and size parameters of the neural network model.)
an instability calculator creating a trained model by reflecting instability in the created structure of the trained model. ([Zhu, page 1, sec 1] “Under this framework, the prediction uncertainty can be decomposed into three types: model uncertainty, inherent noise, and model misspecification” AND [Zhu, page 2, sec 2.2] “stochastic dropouts are applied after each hidden layer, and the model output can be approximately viewed as a random sample generated from the posterior predictive distribution”, wherein the examiner interprets “prediction uncertainty can be decomposed into three types: model uncertainty, inherent noise, and model misspecification” and “stochastic dropouts are applied after each hidden layer” to be the same as an instability calculator creating a trained model by reflecting instability in the created structure of the trained model because they are both directed to incorporating uncertainty quantification mechanisms into the model architecture to capture prediction variability.)

Regarding claim 14, Zhu teaches:
 The apparatus according to claim 13, wherein the instability calculator creates the trained model ([Zhu, page 4, sec 3.1.2] “Here, we take a principled approach by connecting the encoder, g(·), with a prediction network, h(·), and treat them as one large network f = h(g(·)) during inference” AND [Zhu, page 4, sec 3.1.2] “During this feedforward pass, MC dropout is applied to all layers in both the encoder g and the prediction network h”, wherein the examiner interprets “treat them as one large network” and “MC dropout is applied to all layers in both the encoder g and the prediction network h” to be the same as the instability calculator creates the trained model because they are both directed to building a model architecture that inherently incorporates uncertainty estimation mechanisms within its structural design.)
using at least one of percentage instability ([Zhu, page 3, sec 3] “Specifically, we would like to quantify the prediction standard error, η, so that an approximate α-level prediction interval can be constructed by [ŷ* − zα/2η, ŷ* + zα/2η]” AND [Zhu, page 6, sec 4.1.3] “Table 2 reports the empirical coverage of the 95% predictive intervals”, wherein the examiner interprets “α-level prediction interval” and “empirical coverage of the 95% predictive intervals” to be the same as percentage instability because they are both directed to expressing uncertainty as a percentage-based confidence level that quantifies the range within which predictions are expected to fall.)
and instability with reference to a specific critical point. ([Zhu, page 6, sec 4.1.3] “One important use-case of the uncertainty estimation is to provide insight for unusual patterns in the time series” AND [Zhu, page 7, sec 4.2] “A natural approach is to trigger an alarm when the observed value falls outside of the 95% predictive interval”, wherein the examiner interprets “trigger an alarm when the observed value falls outside of the 95% predictive interval” to be the same as instability with reference to a specific critical point because they are both directed to identifying when values exceed a defined threshold or boundary that serves as a reference point for determining abnormal or unstable conditions.)

Regarding claim 20, Zhu teaches:
An apparatus for predicting a future state and reliability based on time series data, ([Zhu, page 1, Abstract] “we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation”, wherein the examiner interprets “end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation” to be the same as an apparatus for predicting a future state and reliability based on time series data because they are both directed to a system that forecasts future values from sequential data while simultaneously quantifying the confidence or trustworthiness of that prediction.)
comprising: a transceiver transmitting and receiving past data and current data to and from an external device; ([Zhu, page 7, sec 4.2] “At Uber, we track millions of metrics each day to monitor the status of various services across the company” AND [Zhu, page 7, sec 4.1.4] “In order to provide real-time anomaly detection at the current scale, each predictive interval must be calculated within a few milliseconds during inference stage”, wherein the examiner interprets “track millions of metrics each day” and “real-time anomaly detection” to be the same as a transceiver transmitting and receiving past data and current data to and from an external device because they are both directed to acquiring and communicating time series data from external sources for processing by the prediction system.)
a processor preprocessing at least one of the past state data and the current state data, ([Zhu, page 5, sec 4.1.1] “The raw data are log-transformed to alleviate exponential effects. Next, within each sliding window, the first day is subtracted from all values, so that trends are removed and the neural network is trained for the incremental value”, wherein the examiner interprets “raw data are log-transformed” and “trends are removed” to be the same as a processor preprocessing at least one of the past state data and the current state data because they are both directed to a computational unit transforming and preparing time series observations before model training and inference.)
creating a trained model through execution of an algorithm based on the created trained model, ([Zhu, page 4, sec 3.2.1] “Prior to fitting the prediction model, we first conduct a pre-training step to fit an encoder that can extract useful and representative embeddings from a time series” AND [Zhu, page 5, sec 3.2.2] “a prediction network is trained to forecast the next one or more timestamps using the learned embedding as features”, wherein the examiner interprets “pre-training step to fit an encoder” and “prediction network is trained to forecast” to be the same as creating a trained model through execution of an algorithm because they are both directed to training a neural network model using an algorithmic process to enable future predictions.)
and predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data; ([Zhu, page 5, sec 3.2.3] “After the full model is trained, the inference stage involves only the encoder and the prediction network” AND [Zhu, page 5, sec 4.1.1] “each sliding window contains the previous 28 days as input, and aims to forecast the upcoming day”, wherein the examiner interprets “inference stage involves only the encoder and the prediction network” and “sliding window contains the previous 28 days as input, and aims to forecast the upcoming day” to be the same as predicting a future state through execution of an algorithm based on the created trained model, the preprocessed current state data, and the preprocessed past state data because they are both directed to using a trained model to process current input that incorporates historical context to generate a forecast of future values.)
and a memory storing the trained model and the future state. ([Zhu, page 7, sec 4.2.1] “Our model inference is implemented in Go. Our implementation involves efficient matrix manipulation operations” AND [Zhu, page 5, sec 3.2.2] “After the encoder-decoder is pre-trained, it is treated as an intelligent feature-extraction blackbox. Specifically, the last LSTM cell states of the encoder are extracted as learned embedding”, wherein the examiner interprets “model inference is implemented” and “last LSTM cell states of the encoder are extracted as learned embedding” to be the same as a memory storing the trained model and the future state because they are both directed to retaining the trained model parameters and extracted representations in computational storage for subsequent retrieval and use during prediction.)

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this
Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not
identically disclosed as set forth in section 102, if the differences between the claimed invention and the
prior art are such that the claimed invention as a whole would have been obvious before the effective filing
date of the claimed invention to a person having ordinary skill in the art to which the claimed invention
pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are
summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness. 
Claim(s) 2,9,12,and 19 are rejected under 35 U.S.C. 103 as being unpatentable over NPL reference “Deep and Confident Prediction for Time Series at Uber”, by Zhu et. al. (referred herein as Zhu) in view of NPL reference “Latent ODEs for Irregularly-Sampled Time Series”, by Rubanova et. al. (referred herein as Rubanova). 

Regarding claim 2, Zhu teaches The method according to claim 1, (see rejection of claim 1).
Zhu does not teach wherein the step of preprocessing past state data comprises: removing outliers from the past data; and calculating a time series length of the past data.
Rubanova teaches wherein the step of preprocessing past state data comprises: removing outliers from the past data; ([Rubanova, page 8, sec 4.4] “To speed up training, we rounded the observation times to the nearest minute, reducing the number of measurements only 2-fold” AND [Rubanova, page 9, sec 5], “used a binary mask to indicate the missing measurements and reported that RNNs performs better with zero-filling than with imputed values.” wherein the examiner interprets “zero-filling” and “rounded the observation times to the nearest minute, reducing the number of measurements” to be the same as removing outliers from the past data because they are both directed to a preprocessing step that filters or reduces raw data points to create a cleaner dataset for model training.)
and calculating a time series length of the past data. ([Rubanova, page 8, sec 4.4] “Hence, there are still 2880 (6048) possible measurement times per time series under our model's preprocessing, while the previous standard was to used only 48 possible measurement times…We evaluated our model on the PhysioNet Challenge 2012 dataset [Silva et al., 2012], which contains 8000 time series, each containing measurements from the first 48 hours”, wherein the examiner interprets “2880 (6048) possible measurement times per time series” and “8000 time series, each containing measurements from the first 48 hours” to be the same as calculating a time series length of the past data because they are both directed to determining and tracking the number of temporal observations or time points that constitute each time series sequence.)
Zhu, Rubanova, and the instant application are analogous art because they are all directed to preprocessing time-series data for use in machine-learning-based prediction models.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Zhu to include the rounding to nearest times preprocessing technique disclosed by Rubanova. One would be motivated to do so to efficiently reduce noise and irregularities in historical time-series inputs, thereby improving model training robustness and computational efficiency, as suggested by Rubanova ([Rubanova, page 8, sec 4.4] “To speed up training, we rounded the observation times to the nearest minute, reducing the number of measurements only 2-fold.”).

Regarding claim 9, Zhu teaches The method according to claim 1, (see rejection of claim 1).
Zhu does not teach wherein the step of predicting a future state comprises: calculating a prediction point-in-time by receiving a future point-in-time that a user wants to know; andpredicting a future state of the user through execution of an algorithm based on the prediction point-in-time and the trained model.
Rubanova teaches wherein the step of predicting a future state comprises: calculating a prediction point-in-time by receiving a future point-in-time that a user wants to know; ([Rubanova, page 2, sec 2] “The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical ODE solver” AND [Rubanova, page 5-6, sec 4] “Latent ODEs can often reconstruct trajectories reasonably well given a small subset of points, and provide an estimate of uncertainty over both the latent trajectories and trajectory (100 points) from a subset of 30 points. At test time, we conditioned this model on a subset of 10, 30 or 50 points…predict points on [2.5; 5] interval (blue area). A Latent ODE with an ODE-RNN encoder was able to extrapolate the time series far beyond the training interval and maintain periodic dynamics”, wherein the examiner interprets “evaluated at any desired times” and “Latent ODEs can often reconstruct trajectories reasonably well given a small subset of points, and provide an estimate of uncertainty over both the latent trajectories…was able to extrapolate the time series far beyond the training interval and maintain periodic dynamics” to be the same as calculating a prediction point-in-time by receiving a future point-in-time that a user wants to know because they are both directed to accepting a specified target time point at which the user desires to obtain a prediction.)
and predicting a future state of the user through execution of an algorithm based on the prediction point-in-time and the trained model. ([Rubanova, page 2, sec 2] “The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical ODE solver: h0, . . . , hN = ODESolve(fθ, h0,(t0, . . . , tN ))” AND [Rubanova, page 3, sec 3.2] “the generative model is defined by ODE whose initial latent state z0 determines the entire trajectoryz0, z1, . . . , zN = ODESolve(fθ, z0,(t0, t1, . . . , tN ))” AND [Rubanova, page 6] “At test time, we conditioned this model on a subset of 10, 30 or 50 points”, wherein the examiner interprets “ODESolve(fθ, z0,(t0, t1, . . . , tN ))” and “conditioned this model on a subset of 10, 30 or 50 points” to be the same as predicting a future state of the user through execution of an algorithm based on the prediction point-in-time and the trained model because they are both directed to using a trained model with an algorithmic solver to generate predictions at the specified future time points.)
Zhu, Rubanova, and the instant application are analogous art because they are all directed to methods for predicting a future state from time series data at a specified future time point using trained models and algorithmic execution.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Zhu to include the capability of “evaluated at any desired times” disclosed by Rubanova. One would be motivated to do so to flexibly generate predictions at arbitrary future time points selected by a user, as suggested by Rubanova ([Rubanova, page 2, sec 2], “The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical ODE solver.”).

Regarding claim 12, Zhu teaches The apparatus according to claim 11, (see rejection of claim 11).
Zhu does not teach wherein the time series feature preprocessing unit comprises: an outlier-processing device removing outliers from the past data; and a time series length calculator calculating a time series length of the past data.
Rubanova teaches: 
wherein the time series feature preprocessing unit comprises: an outlier-processing device removing outliers from the past data; ([Rubanova, page 8, sec 4.4] “To speed up training, we rounded the observation times to the nearest minute, reducing the number of measurements only 2-fold” AND [Rubanova, page 9, sec 5] “used a binary mask to indicate the missing measurements and reported that RNNs performs better with zero-filling than with imputed values”, wherein the examiner interprets “zero-filling” and “rounded the observation times to the nearest minute, reducing the number of measurements” to be the same as an outlier-processing device removing outliers from the past data because they are both directed to a preprocessing component that filters or reduces raw data points to create a cleaner dataset for model training.)
and a time series length calculator calculating a time series length of the past data. ([Rubanova, page 8, sec 4.4] “Hence, there are still 2880 (6048) possible measurement times per time series under our model's preprocessing, while the previous standard was to used only 48 possible measurement times…We evaluated our model on the PhysioNet Challenge 2012 dataset [Silva et al., 2012], which contains 8000 time series, each containing measurements from the first 48 hours”, wherein the examiner interprets “2880 (6048) possible measurement times per time series” and “8000 time series, each containing measurements from the first 48 hours” to be the same as a time series length calculator calculating a time series length of the past data because they are both directed to a computational element that determines and tracks the number of temporal observations or time points that constitute each time series sequence.)
Zhu, Rubanova, and the instant application are analogous art because they are all directed to preprocessing time-series data for use in machine-learning-based prediction models.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 11 disclosed by Zhu to include the rounding to nearest times preprocessing technique disclosed by Rubanova. One would be motivated to do so to efficiently reduce noise and irregularities in historical time-series inputs, thereby improving model training robustness and computational efficiency, as suggested by Rubanova ([Rubanova, page 8, sec 4.4] “To speed up training, we rounded the observation times to the nearest minute, reducing the number of measurements only 2-fold.”).

Regarding claim 19, Zhu teaches The apparatus according to claim 11 (see rejection of claim 11)
Zhu does not teach wherein the future state prediction unit calculates a prediction point-in-time by receiving a future point-in-time that a user wants to know, and predicts a future state of a user through execution of an algorithm based on the prediction point-in-time and the trained model.
Rubanova teaches wherein the future state prediction unit calculates a prediction point-in-time by receiving a future point-in-time that a user wants to know, ([Rubanova, page 2, sec 2] “The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical ODE solver” AND [Rubanova, page 5-6, sec 4] “Latent ODEs can often reconstruct trajectories reasonably well given a small subset of points, and provide an estimate of uncertainty over both the latent trajectories and trajectory (100 points) from a subset of 30 points. At test time, we conditioned this model on a subset of 10, 30 or 50 points…predict points on [2.5; 5] interval (blue area). A Latent ODE with an ODE-RNN encoder was able to extrapolate the time series far beyond the training interval and maintain periodic dynamics”, wherein the examiner interprets “evaluated at any desired times” and “Latent ODEs can often reconstruct trajectories reasonably well given a small subset of points, and provide an estimate of uncertainty over both the latent trajectories…was able to extrapolate the time series far beyond the training interval and maintain periodic dynamics” to be the same as calculating a prediction point-in-time by receiving a future point-in-time that a user wants to know because they are both directed to accepting a specified target time point at which the user desires to obtain a prediction.”) 
and predicts a future state of a user through execution of an algorithm based on the prediction point-in-time and the trained model. ([Rubanova, page 2, sec 2] “The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical ODE solver: h0, . . . , hN = ODESolve(fθ, h0,(t0, . . . , tN ))” AND [Rubanova, page 3, sec 3.2] “the generative model is defined by ODE whose initial latent state z0 determines the entire trajectoryz0, z1, . . . , zN = ODESolve(fθ, z0,(t0, t1, . . . , tN ))” AND [Rubanova, page 6] “At test time, we conditioned this model on a subset of 10, 30 or 50 points”, wherein the examiner interprets “ODESolve(fθ, z0,(t0, t1, . . . , tN ))” and “conditioned this model on a subset of 10, 30 or 50 points” to be the same as predicts a future state of a user through execution of an algorithm based on the prediction point-in-time and the trained model because they are both directed to using a trained model with an algorithmic solver to generate predictions at the specified future time points.)
Zhu, Rubanova, and the instant application are analogous art because they are all directed to methods for predicting a future state from time series data at a specified future time point using trained models and algorithmic execution.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 11 disclosed by Zhu to include the capability of “evaluated at any desired times” disclosed by Rubanova. One would be motivated to do so to flexibly generate predictions at arbitrary future time points selected by a user, as suggested by Rubanova ([Rubanova, page 2, sec 2], “The hidden state h(t) is defined at all times, and can be evaluated at any desired times using a numerical ODE solver.”).

Claim(s) 5,7,15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of NPL reference “Deep State Space Models for Time Series Forecasting”, by Rangapuram et. al. (referred herein as Rangapuram). 

-- Regarding claim 5, Zhu teaches The method according to claim 1, (see rejection of claim 1).
Zhu further teaches calculating instability in the course of predicting the future state based on the time series feature of the preprocessed past state data. ([Zhu, page 3, sec 3] “our goal is to evaluate the uncertainty of the model prediction” AND [Zhu, page 3, sec 3.1] “the variance of the prediction distribution quantifies the prediction uncertainty, which can be further decomposed using law of total variance” AND [Zhu, page 3, sec 3.1.1] “the model uncertainty can be approximated by the sample variance”, wherein the examiner interprets “evaluate the uncertainty of the model prediction” and “variance of the prediction distribution quantifies the prediction uncertainty” and “model uncertainty can be approximated by the sample variance” to be the same as calculating instability in the course of predicting the future state based on the time series feature of the preprocessed past state data because they are both directed to computing a measure of variability or uncertainty associated with the prediction output derived from the temporal features of the input data.)
Zhu does not teach wherein the step of predicting a future state through execution of an algorithm comprises: creating a structure of the trained model; executing an algorithm reflecting a time series feature of the preprocessed past state data in the trained model; processing a prediction point-in-time of the preprocessed past state data; applying a suitable feature to the preprocessed past state data by modeling an environment condition feature;
Rangapuram teaches: 
wherein the step of predicting a future state through execution of an algorithm comprises: creating a structure of the trained model; ([Rangapuram, page 4, sec 4] “We parameterize the mapping Ψ from covariates to state space model parameters using a deep recurrent neural network (RNN) … a multi-layer recurrent neural network with LSTM cells and parameters Φ computes a representation of the features via a recurrent function h”, wherein the examiner interprets “parameterize the mapping Ψ from covariates to state space model parameters using a deep recurrent neural network” and “multi-layer recurrent neural network with LSTM cells” to be the same as creating a structure of the trained model because they are both directed to defining and constructing the architectural framework of the neural network model used for prediction.)
executing an algorithm reflecting a time series feature of the preprocessed past state data in the trained model; ([Rangapuram, page 3, sec 3] “SSMs model the temporal structure of the data via a latent state lt ∈ R^L that can be used to encode time series components such as level, trend, and seasonality patterns” AND [Rangapuram, page 4, sec 4] “The real-valued output vector of the last LSTM layer is then mapped to the parameters Θ(i)t of the state space model”, wherein the examiner interprets “encode time series components such as level, trend, and seasonality patterns” and “output vector of the last LSTM layer is then mapped to the parameters” to be the same as executing an algorithm reflecting a time series feature of the preprocessed past state data in the trained model because they are both directed to applying computational procedures that capture and incorporate temporal characteristics of the input data within the model.)
processing a prediction point-in-time of the preprocessed past state data; ([Rangapuram, page 5, sec 4.2] “starting with sample T ∼ p(T |z1:T ), we recursively apply yT+t = a>T+tT+t−1 + bT+t, t = 1, . . . τ” AND [Rangapuram, page 3, sec 3] “The time point Ti + 1 is referred to as forecast start time and τ ∈ N>0 is the forecast horizon”, wherein the examiner interprets “recursively apply yT+t = a>T+tT+t−1 + bT+t” and “forecast start time and τ ∈ N>0 is the forecast horizon” to be the same as processing a prediction point-in-time of the preprocessed past state data because they are both directed to computing predictions at specific future time points based on the processed historical observations.)
applying a suitable feature to the preprocessed past state data by modeling an environment condition feature; and ([Rangapuram, page 2, sec 3] “let {x(i)1:Ti+τ}Ni=1 be a set of associated, time-varying covariate vectors with x(i)t ∈ R^D” AND [Rangapuram, page 4, sec 4.1] “The covariates (features) can be time dependent (e.g. product price or a set of dummy variables indicating day-of-week) or time independent (e.g., product brand, category etc.)”, wherein the examiner interprets “time-varying covariate vectors” and “covariates (features) can be time dependent (e.g. product price or a set of dummy variables indicating day-of-week)” to be the same as applying a suitable feature to the preprocessed past state data by modeling an environment condition feature because they are both directed to incorporating external contextual variables that represent environmental or situational conditions affecting the time series.)
Zhu, Rangapuram, and the instant application are analogous art because they are all directed to methods for predicting a future state of a time series using trained models that incorporate temporal features and contextual information.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Zhu to include the use of “associated, time-varying covariate vectors” disclosed by Rangapuram. One would be motivated to do so to effectively incorporate environmental and contextual information into the prediction process in order to improve the quality and robustness of future state predictions, as suggested by Rangapuram ([Rangapuram, page 2, sec 3], “let {x(i)_{1:Ti+τ}}_{i=1}^N be a set of associated, time-varying covariate vectors with x(i)_t ∈ R^D … Our goal is to produce a set of probabilistic forecasts.”).

Regarding claim 7, Zhu and Rangapuram teach The method according to claim 5, (see rejection of claim 5).
Rangapuram further teaches:
wherein the step of processing a prediction point-in-time of the preprocessed past state data comprises: training a function for estimation of a variation rate of input data through deep learning; ([Rangapuram, page 3, sec 3] “the latent state lt-1 maintains information about level, trend, and seasonality patterns and evolves by way of a deterministic transition matrix Ft and a random innovation gtεt” , [Rangapuram, page 3, sec 3] “The structure of the transition matrix Ft and innovation strength gt determine which kind of time series patterns are encoded by the latent state lt” AND [Rangapuram, page 4, sec 4.1] “The model parameters Φ are learned by maximizing the probability of observing the data”, wherein the examiner interprets “evolves by way of a deterministic transition matrix Ft and a random innovation gtεt” and “transition matrix Ft and innovation strength gt determine which kind of time series patterns” and “model parameters Φ are learned by maximizing the probability” to be the same as training a function for estimation of a variation rate of input data through deep learning because they are both directed to learning a function through neural network optimization that captures how the state changes or varies over time.)
and calculating a variation estimation function depending on the prediction point-in-time using the function. ([Rangapuram, page 5, sec 4.2] “starting with sample T ∼ p(T |z1:T ), we recursively apply yT+t = a>T+tT+t−1 + bT+t, t = 1, . . . τ…lT+t ∼ FT+tT+t−1 + gT+tεT+t, εT+t ∼ N (0, 1), t = 1, . . . τ − 1” AND [Rangapuram, page 6, sec 4.2] “we unroll the RNN for the prediction range t = Ti + 1, . . . , Ti + τ and obtain Θ(i)Ti+1:Ti+τ, then generate the prediction samples by recursively applying above equations”, wherein the examiner interprets “recursively apply yT+t = a>T+tT+t−1 + bT+t” and “lT+t ∼ FT+tT+t−1 + gT+tεT+t” and “unroll the RNN for the prediction range” to be the same as calculating a variation estimation function depending on the prediction point-in-time using the function because they are both directed to applying the learned transition dynamics to compute state evolution and predictions at each specified future time point in the forecast horizon.)
Zhu, Rangapuram, and the instant application are analogous art because they are all directed to predicting future states of time-series data using learned models that account for temporal variation and uncertainty across a forecast horizon.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 1 disclosed by Zhu to include the use of a transition matrix disclosed by Rangapuram. One would be motivated to do so to more accurately and effectively model temporal variation across future prediction points, as suggested by Rangapuram ([Rangapuram, page 3, sec 3] “the latent state lt-1maintains information about level, trend, and seasonality patterns and evolves by way of a deterministic transition matrix F_t and a random innovation 𝑔𝑡𝜀𝑡”).

Regarding claim 15, Zhu teaches The apparatus according to claim 11, (see rejection of claim 11).
Zhu further teaches and an instability processing device calculating instability in the course of predicting the future state based on the time series feature of the preprocessed past state data. ([Zhu, page 3, sec 3] “our goal is to evaluate the uncertainty of the model prediction” AND [Zhu, page 3, sec 3.1] “the variance of the prediction distribution quantifies the prediction uncertainty, which can be further decomposed using law of total variance” AND [Zhu, page 3, sec 3.1.1] “the model uncertainty can be approximated by the sample variance”, wherein the examiner interprets “evaluate the uncertainty of the model prediction” and “variance of the prediction distribution quantifies the prediction uncertainty” and “model uncertainty can be approximated by the sample variance” to be the same as an instability processing device calculating instability in the course of predicting the future state based on the time series feature of the preprocessed past state data because they are both directed to a processing component that computes a measure of variability or uncertainty associated with the prediction output derived from the temporal features of the input data.)
Zhu does not teach further comprising: an algorithm calculator, the algorithm calculator comprising: a model variable setting device creating a structure of the trained model; a time series feature processing device applying an algorithm reflecting a time series feature of the preprocessed past state data in the trained model; a point-in-time feature processing device processing a prediction point-in-time of the preprocessed past state data; an environment feature processing device applying a suitable feature to the preprocessed past state data by modeling an environment condition feature;
Rangapuram teaches: 
further comprising: an algorithm calculator, the algorithm calculator comprising: ([Rangapuram, page 4, sec 4] “Given the covariates x(i)t associated with time series z(i)t, a multi-layer recurrent neural network with LSTM cells and parameters Φ computes a representation of the features via a recurrent function h”, wherein the examiner interprets “multi-layer recurrent neural network with LSTM cells and parameters Φ computes a representation” to be the same as an algorithm calculator because they are both directed to a computational component that executes algorithmic operations for processing time series data.)
a model variable setting device creating a structure of the trained model; ([Rangapuram, page 4, sec 4] “We parameterize the mapping Ψ from covariates to state space model parameters using a deep recurrent neural network (RNN)” AND [Rangapuram, page 4, sec 4] “a multi-layer recurrent neural network with LSTM cells and parameters Φ computes a representation of the features via a recurrent function h”, wherein the examiner interprets “parameterize the mapping Ψ from covariates to state space model parameters using a deep recurrent neural network” and “multi-layer recurrent neural network with LSTM cells” to be the same as a model variable setting device creating a structure of the trained model because they are both directed to a component that defines and constructs the architectural framework of the neural network model used for prediction.)
a time series feature processing device applying an algorithm reflecting a time series feature of the preprocessed past state data in the trained model; ([Rangapuram, page 3, sec 3] “SSMs model the temporal structure of the data via a latent state lt ∈ R^L that can be used to encode time series components such as level, trend, and seasonality patterns” AND [Rangapuram, page 4, sec 4] “The real-valued output vector of the last LSTM layer is then mapped to the parameters Θ(i)t of the state space model”, wherein the examiner interprets “encode time series components such as level, trend, and seasonality patterns” and “output vector of the last LSTM layer is then mapped to the parameters” to be the same as a time series feature processing device applying an algorithm reflecting a time series feature of the preprocessed past state data in the trained model because they are both directed to a processing component that applies computational procedures to capture and incorporate temporal characteristics of the input data within the model.)
a point-in-time feature processing device processing a prediction point-in-time of the preprocessed past state data; ([Rangapuram, page 5, sec 4.2] “starting with sample T ∼ p(T |z1:T ), we recursively apply yT+t = a>T+tT+t−1 + bT+t, t = 1, . . . τ” AND [Rangapuram, page 3, sec 3] “The time point Ti + 1 is referred to as forecast start time and τ ∈ N>0 is the forecast horizon”, wherein the examiner interprets “recursively apply yT+t = a>T+tT+t−1 + bT+t” and “forecast start time and τ ∈ N>0 is the forecast horizon” to be the same as a point-in-time feature processing device processing a prediction point-in-time of the preprocessed past state data because they are both directed to a processing component that computes predictions at specific future time points based on the processed historical observations.)
an environment feature processing device applying a suitable feature to the preprocessed past state data by modeling an environment condition feature; ([Rangapuram, page 2, sec 3] “let {x(i)1:Ti+τ}Ni=1 be a set of associated, time-varying covariate vectors with x(i)t ∈ R^D” AND [Rangapuram, page 4, sec 4.1] “The covariates (features) can be time dependent (e.g. product price or a set of dummy variables indicating day-of-week) or time independent (e.g., product brand, category etc.)”, wherein the examiner interprets “time-varying covariate vectors” and “covariates (features) can be time dependent (e.g. product price or a set of dummy variables indicating day-of-week)” to be the same as an environment feature processing device applying a suitable feature to the preprocessed past state data by modeling an environment condition feature because they are both directed to a processing component that incorporates external contextual variables representing environmental or situational conditions affecting the time series.)
Zhu, Rangapuram, and the instant application are analogous art because they are all directed to methods for predicting a future state of a time series using trained models that incorporate temporal features and contextual information.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 11 disclosed by Zhu to include the use of “associated, time-varying covariate vectors” disclosed by Rangapuram. One would be motivated to do so to effectively incorporate environmental and contextual information into the prediction process in order to improve the quality and robustness of future state predictions, as suggested by Rangapuram (Rangapuram, page 2, sec 3], “let {x(i)_{1:Ti+τ}}_{i=1}^N be a set of associated, time-varying covariate vectors with x(i)_t ∈ R^D … Our goal is to produce a set of probabilistic forecasts”).

Regarding claim 17, Zhu and Rangapuram teach The apparatus according to claim 15, (see rejection of claim 15).
Rangapuram further teaches wherein the point-in-time feature processing device trains a function for estimation of a variation rate of input data through deep learning ([Rangapuram, page 3, sec 3] “the latent state lt-1 maintains information about level, trend, and seasonality patterns and evolves by way of a deterministic transition matrix Ft and a random innovation gtεt” AND [Rangapuram, page 3, sec 3] “The structure of the transition matrix Ft and innovation strength gt determine which kind of time series patterns are encoded by the latent state lt” AND [Rangapuram, page 4, sec 4.1] “The model parameters Φ are learned by maximizing the probability of observing the data”, wherein the examiner interprets “evolves by way of a deterministic transition matrix Ft and a random innovation gtεt” and “transition matrix Ft and innovation strength gt determine which kind of time series patterns” and “model parameters Φ are learned by maximizing the probability” to be the same as the point-in-time feature processing device trains a function for estimation of a variation rate of input data through deep learning because they are both directed to a processing component that learns a function through neural network optimization that captures how the state changes or varies over time.)
and calculates a variation estimation function depending on a prediction point-in-time using the trained function for estimation of a variation rate of input data. ([Rangapuram, page 5, sec 4.2] “starting with sample T ∼ p(T |z1:T ), we recursively apply yT+t = a>T+tT+t−1 + bT+t, t = 1, . . . τ…lT+t ∼ FT+tT+t−1 + gT+tεT+t, εT+t ∼ N (0, 1), t = 1, . . . τ - 1” AND [Rangapuram, page 6, sec 4.2] “we unroll the RNN for the prediction range t = Ti + 1, . . . , Ti + τ and obtain Θ(i)Ti+1:Ti+τ, then generate the prediction samples by recursively applying above equations”, wherein the examiner interprets “recursively apply yT+t = a>T+tT+t−1 + bT+t” and “lT+t ∼ FT+tT+t-1 + gT+tεT+t” and “unroll the RNN for the prediction range” to be the same as calculates a variation estimation function depending on a prediction point-in-time using the trained function for estimation of a variation rate of input data because they are both directed to a processing component that applies the learned transition dynamics to compute state evolution and predictions at each specified future time point in the forecast horizon.)
Zhu, Rangapuram, and the instant application are analogous art because they are all directed to predicting future states of time-series data using learned models that account for temporal variation and uncertainty across a forecast horizon.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 15 disclosed by Zhu and Rangapuram to include the use of a transition matrix disclosed by Rangapuram. One would be motivated to do so to more accurately and effectively model temporal variation across future prediction points, as suggested by Rangapuram ([Rangapuram, page 3, sec 3] “the latent state 1t-1 maintains information about level, trend, and seasonality patterns and evolves by way of a deterministic transition matrix F_t and a random innovation 𝑔𝑡𝜀𝑡”).

Claim(s) 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Rangapuram further in view of NPL reference “Mixture Density Networks”, by Bishop et. al. (referred herein as Bishop). 

Regarding claim 6, Zhu and Rangapuram teach The method according to claim 5, (see rejection of claim 5).
Zhu and Rangapuram do not teach further comprising: applying the suitable feature to the preprocessed past state data by modeling the environment condition feature; creating a complexity distribution of the preprocessed past state data; creating a complexity distribution sample based on the created complexity distribution; and creating future state data based on the created complexity distribution sampling.
Bishop teaches:
 further comprising: applying the suitable feature to the preprocessed past state data by modeling the environment condition feature; creating a complexity distribution of the preprocessed past state data; creating a complexity distribution sample based on the created complexity distribution; and creating future state data based on the created complexity distribution sampling.
further comprising: applying the suitable feature to the preprocessed past state data by modeling the environment condition feature; ([Bishop, page 7, sec 3] “we now take the various parameters of the mixture model, namely the mixing coefficients αi(x), the means μi(x) and the variances σi(x), to be general (continuous) functions of x…This is achieved by modelling them using the outputs of a conventional neural network which takes x as its input”, wherein the examiner interprets “parameters of the mixture model…modelling them using the outputs of a conventional neural network which takes x as its input” to be the same as applying the suitable feature to the preprocessed past state data by modeling the environment condition feature because they are both directed to incorporating input-dependent contextual information that influences the output distribution parameters.)
creating a complexity distribution of the preprocessed past state data; ([Bishop, page 6, sec 3] “The probability density of the target data is then represented as a linear combination of kernel functions in the form p(t | x) = Σαi(x)φi(t | x)…we shall restrict attention to kernel functions which are Gaussian of the form φi(t | x) = 1/(2π)^(c/2)σi(x)^c exp{-||t - μi(x)||^2 / 2σi(x)^2}”, wherein the examiner interprets “probability density of the target data is then represented as a linear combination of kernel functions” and “kernel functions which are Gaussian” to be the same as creating a complexity distribution of the preprocessed past state data because they are both directed to constructing a mixture of multiple probability distributions that can model complex, multi-modal data patterns.)
creating a complexity distribution sample based on the created complexity distribution; ([Bishop, page 12, sec 3] “the most probable branch of the solution, assuming the components are well separated and have negligible overlap, is given by max{αi(x)}” AND [Bishop, page 12, sec 3] “The required value of t is then given by the corresponding centre μi”, wherein the examiner interprets “most probable branch of the solution” and “required value of t is then given by the corresponding centre μi” to be the same as creating a complexity distribution sample based on the created complexity distribution because they are both directed to selecting or sampling a specific output value from the constructed mixture distribution.)
and creating future state data based on the created complexity distribution sampling. ([Bishop, page 11, sec 3] “One of the simplest statistics is the mean, corresponding to the conditional average of the target data, given by <t | x> = Σαi(x)μi(x)” AND [Bishop, page 18, sec 5] “Having obtained a good representation for the conditional density of the target data, it is then in principle straightforward to calculate any desired statistic from that distribution”, wherein the examiner interprets “conditional average of the target data, given by <t | x> = Σαi(x)μi(x)” and “calculate any desired statistic from that distribution” to be the same as creating future state data based on the created complexity distribution sampling because they are both directed to generating predicted output values by computing statistics or samples from the learned mixture distribution model.)
Zhu, Rangapuram, Bishop, and the instant application are analogous art because they are all directed to probabilistic time series forecasting methods that generate future state data.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 5 disclosed by Zhu and Rangapuram to include the technique to perform a linear combination of kernel functions disclosed by Bishop. One would be motivated to do so to effectively model complex, multi-modal future outcome distributions, as suggested by Bishop ([Bishop, page 6, sec 3] “The probability density of the target data is then represented as a linear combination of kernel functions”).

Regarding claim 16, Zhu and Rangapuram teach The apparatus according to claim 15, (see rejection of claim 15).
Zhu and Rangapuram do not teach wherein the environment feature processing device creates a complexity distribution of the preprocessed past state data, a complexity distribution sample based on the created complexity distribution, and future state data based on the created complexity distribution sampling.
Bishop teaches: 
wherein the environment feature processing device creates a complexity distribution of the preprocessed past state data, ([Bishop, page 6, sec 3] “The probability density of the target data is then represented as a linear combination of kernel functions in the form p(t | x) = Σαi(x)φi(t | x)” AND [Bishop, page 6, sec 3] “we shall restrict attention to kernel functions which are Gaussian of the form φi(t | x) = 1/(2π)^(c/2)σi(x)^c exp{-||t - μi(x)||^2 / 2σi(x)^2}”, wherein the examiner interprets “probability density of the target data is then represented as a linear combination of kernel functions” and “kernel functions which are Gaussian” to be the same as the environment feature processing device creates a complexity distribution of the preprocessed past state data because they are both directed to a processing component that constructs a mixture of multiple probability distributions that can model complex, multi-modal data patterns.)
a complexity distribution sample based on the created complexity distribution, ([Bishop, page 12, sec 3] “the most probable branch of the solution, assuming the components are well separated and have negligible overlap, is given by max{αi(x)}” AND [Bishop, page 12, sec 3] “The required value of t is then given by the corresponding centre μi”, wherein the examiner interprets “most probable branch of the solution” and “required value of t is then given by the corresponding centre μi” to be the same as a complexity distribution sample based on the created complexity distribution because they are both directed to selecting or sampling a specific output value from the constructed mixture distribution.)
and future state data based on the created complexity distribution sampling. ([Bishop, page 11, sec 3] “One of the simplest statistics is the mean, corresponding to the conditional average of the target data, given by <t | x> = Σαi(x)μi(x)” AND [Bishop, page 18, sec 5] “Having obtained a good representation for the conditional density of the target data, it is then in principle straightforward to calculate any desired statistic from that distribution”, wherein the examiner interprets “conditional average of the target data, given by <t | x> = Σαi(x)μi(x)” and “calculate any desired statistic from that distribution to be the same as future state data based on the created complexity distribution sampling because they are both directed to generating predicted output values by computing statistics or samples from the learned mixture distribution model.)
Zhu, Rangapuram, Bishop, and the instant application are analogous art because they are both directed to generating future state data for time series using probabilistic modeling of latent state dynamics, including learning distributions over future outcomes from historical observations.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the apparatus claim 15 disclosed by Zhu and Rangapuram to include the “linear combination of kernel functions” disclosed by Bishop. One would be motivated to do so to effectively model complex and multi-modal predictive distributions for future state estimation, as suggested by Bishop ([Bishop, page 6, sec 3] “The probability density of the target data is then represented as a linear combination of kernel functions.”).

Claim(s) 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Rangapuram further in view of NPL reference “Temporal fusion transformers for interpretable multi-horizon time series forecasting.”, by Lim et. al. (referred herein as Lim). 

Regarding claim 8, Zhu and Rangapuram teach The method according to claim 5, (see rejection of claim 5).
Zhu further teaches: 
wherein the step of calculating instability comprises: calculating instability [Zhu, page 3, sec 3] “our goal is to evaluate the uncertainty of the model prediction … we propose that a complete measurement of prediction uncertainty should be a combination from three sources: (i) model uncertainty, (ii) model misspecification, and (iii) inherent noise level”, AND [Zhu, page 3, sec 3.1.1] “the model uncertainty can be approximated by the sample variance” AND [Zhu, page 4, sec 3.1.2] “The next question is how to incorporate this uncertainty in the variance calculation. Here, we take a principled approach by connecting the encoder, g(·), with a prediction network, h(·), and treat them as one large network f = h(g(·)) during inference.”, wherein the examiner interprets “model uncertainty can be approximated by the sample variance” and “this uncertainty in the variance calculation. Here, we take a principled approach by connecting the encoder, g(·), with a prediction network, h(·), and treat them as one large network f = h(g(·)) during inference.” to be the same as calculating instability because they are both directed to computing a quantitative measure of prediction uncertainty or variability.)
corresponding to at least one of time series instability, point-in-time instability and distribution complexity instability ([Zhu, page 1, sec 1] “model uncertainty, inherent noise, and model misspecification”, wherein the examiner interprets “inherent noise” to be the same as time series instability because they are both directed to instability/variability in the observed time-series behavior, interprets “model uncertainty” to be the same as point-in-time instability because they are both directed to uncertainty of a prediction at a given prediction input/time-point, and interprets “model misspecification” to be the same as distribution complexity instability because they are both directed to instability driven by distributional differences/complexity between training and prediction conditions.)
through deep learning. ([Zhu, page 1, Abstract], “we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation.”, wherein the examiner interprets “deep learning models” to be the same as deep learning because they are both directed to using deep neural-network-based learning.)
Zhu does not teach using a weighted sum.
Lim teaches using a weighted sum. ([Lim, page 1752, sec 4.2], “Processed features are then weighted by their variable selection weights and combined:”, wherein the examiner interprets “weighted by their variable selection weights and combined” to be the same as using a weighted sum because they are both directed to multiplying inputs by weights and aggregating them into a combined result.)
Zhu, Rangapuram, Lim, and the instant application are analogous art because they are all directed to methods for predicting future states of time series data while computing quantitative measures related to prediction uncertainty or instability.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 5 disclosed by Zhu and Rangapuram to include the feature of weighting technique disclosed by Lim. One would be motivated to do so to efficiently aggregate multiple sources of uncertainty-related information into a single instability measure, as suggested by Lim ([Lim, page 1752, sec 4.2] “Processed features are then weighted by their variable selection weights and combined:”).

Regarding claim 18, Zhu and Rangapuram teach The apparatus according to claim 15 (see rejection of claim 15).
Zhu further teaches wherein the instability processing device calculates instability ([Zhu, page 3, sec 3] “our goal is to evaluate the uncertainty of the model prediction”, [Zhu, page 3, sec 3.1] “we propose that a complete measurement of prediction uncertainty should be a combination from three sources: (i) model uncertainty, (ii) model misspecification, and (iii) inherent noise level”, [Zhu, page 3, sec 3.1.1] “the model uncertainty can be approximated by the sample variance” AND [Zhu, page 4, sec 3.1.2] “The next question is how to incorporate this uncertainty in the variance calculation. Here, we take a principled approach by connecting the encoder, g(·), with a prediction network, h(·), and treat them as one large network f = h(g(·)) during inference.”, wherein the examiner interprets “model uncertainty can be approximated by the sample variance” and “this uncertainty in the variance calculation. Here, we take a principled approach by connecting the encoder, g(·), with a prediction network, h(·), and treat them as one large network f = h(g(·)) during inference.” to be the same as calculating instability because they are both directed to computing a quantitative measure of prediction uncertainty or variability.)
corresponding to at least one of time series instability, point-in-time instability and distribution complexity instability ([Zhu, page 1, sec 1] “ Under this framework, the prediction uncertainty can be decomposed into three types: model uncertainty, inherent noise, and model misspecification. Model uncertainty, also referred to as epistemic uncertainty, captures our ignorance of the model parameters, and can be reduced as more samples being collected. Inherent noise, on the other hand, captures the uncertainty in the data generation process and is irreducible”, wherein the examiner interprets “inherent noise” to be the same as time series instability because they are both directed to instability/variability in the observed time-series behavior, interprets “model uncertainty” to be the same as point-in-time instability because they are both directed to uncertainty of a prediction at a given prediction input/time-point, and interprets “model misspecification” to be the same as distribution complexity instability because they are both directed to instability driven by distributional differences/complexity between training and prediction conditions.)
through deep learning. ([Zhu, page 1, Abstract] “we propose a novel end-to-end Bayesian deep model that provides time series prediction along with uncertainty estimation”, wherein the examiner interprets “end-to-end Bayesian deep model” to be the same as through deep learning because they are both directed to using deep neural-network-based learning.)
Zhu does not teach using a weighted sum.
Lim teaches using a weighted sum. ([Lim, page 1752, sec 4.2] “Processed features are then weighted by their variable selection weights and combined:”, wherein the examiner interprets “weighted by their variable selection weights and combined” to be the same as using a weighted sum because they are both directed to multiplying inputs by weights and aggregating them into a combined result.)
Zhu, Rangapuram, Lim, and the instant application are analogous art because they are all directed to methods for predicting future states of time series data while computing quantitative measures related to prediction uncertainty or instability.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method of claim 15 disclosed by Zhu and Rangapuram to include the feature of weighting technique disclosed by Lim. One would be motivated to do so to efficiently aggregate multiple sources of uncertainty-related information into a single instability measure, as suggested by Lim ([Lim, page 1752, sec 4.2] “Processed features are then weighted by their variable selection weights and combined:”).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zhu in view of Rubanova further in view of Lim. 

Regarding claim 10, Zhu and Rubanova teach The method according to claim 9, (see rejection of claim 9).
Zhu further teaches calculating at least one of reliability of the future state, …. ([Zhu, page 3, sec 3] “Specifically, we would like to quantify the prediction standard error, η, so that an approximate α-level prediction interval can be constructed” AND [Zhu, page 1, sec 1] “assessing how much to trust the forecast produced by the model”, wherein the examiner interprets “quantify the prediction standard error”, “trust the forecast”, and “prediction interval” to be the same as “reliability of the future state” because they are all directed to computing a quantitative measure indicating how dependable the predicted future outcome is).
and instability ([Zhu, page 3, sec 3.1] “the variance of the prediction distribution quantifies the prediction uncertainty”, wherein the examiner interprets “variance” and “prediction uncertainty” to be the same as “instability” because they are both directed to variability/volatility in the predicted future outcome).
Zhu and Rubanova do not teach wherein the step of predicting the future state comprises: … a prediction basis of the future state.
Lim teaches wherein the step of predicting the future state comprises: ([Lim, page 1750, sec 2] “direct methods are trained to explicitly generate forecasts”, wherein the examiner interprets “explicitly generate forecasts” to be the same as “predicting the future state” because they are both directed to producing a predicted future outcome from time series inputs).
a prediction basis of the future state, ([Lim, page 1749, sec 1] “TFT enables three valuable interpretability use cases: helping users identify (i) globally-important variables for the prediction problem” AND [Lim, page 1751, sec 3] “Fig. 2. TFT architecture. TFT inputs static metadata, time-varying past inputs and time-varying a priori known future inputs”, wherein the examiner interprets “interpretability use cases” and “globally-important variables”, as well as “time-varying a priori known future inputs” to be the same as “a prediction basis of the future state” because they are both directed to providing an explanation for why the model produced the future prediction).
Zhu, Rubanova, Lim, and the instant application are analogous art because they are all directed to predicting a future state from time series data.
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to modify the method claim 9 disclosed by Zhu and Rubanova to include the interpretability technique disclosed by Lim. One would be motivated to do so to effectively provide explanatory information that clarifies how and why a predicted future state was generated, as suggested by Lim ([Lim, page 1749, sec 1] “interpretability use cases: helping users identify … globally-important variables for the prediction problem.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DEVAN KAPOOR whose telephone number is (703)756-1434. The examiner can normally be reached Monday - Friday: 9:00AM - 5:00 PM EST (times may vary).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DEVAN KAPOOR/Examiner, Art Unit 2126                                                                                                                                                                                                        
/DAVID YI/Supervisory Patent Examiner, Art Unit 2126
Read full office action
METHOD AND APPARATUS FOR PREDICTING FUTURE STATE AND RELIABILITY BASED ON TIME SERIES DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

METHOD AND APPARATUS FOR PREDICTING FUTURE STATE AND RELIABILITY BASED ON TIME SERIES DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email