Office Action Analysis: 18309268 — TIME SERIES FORECASTING

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1 – 25 are pending and examined herein. 
Claims 1 – 25 are rejected under 35 U.S.C. 101.
Claims 1 – 25 are rejected under 35 U.S.C. 103.

Specification
Applicant is reminded of the proper content of an abstract of the disclosure.
A patent abstract is a concise statement of the technical disclosure of the patent and should include that which is new in the art to which the invention pertains. The abstract should not refer to purported merits or speculative applications of the invention and should not compare the invention with the prior art.
If the patent is of a basic nature, the entire technical disclosure may be new in the art, and the abstract should be directed to the entire disclosure. If the patent is in the nature of an improvement in an old apparatus, process, product, or composition, the abstract should include the technical disclosure of the improvement. The abstract should also mention by way of example any preferred modifications or alternatives. 
Where applicable, the abstract should include the following: (1) if a machine or apparatus, its organization and operation; (2) if an article, its method of making; (3) if a chemical compound, its identity and use; (4) if a mixture, its ingredients; (5) if a process, the steps.
Extensive mechanical and design details of an apparatus should not be included in the abstract. The abstract should be in narrative form and generally limited to a single paragraph within the range of 50 to 150 words in length.
See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.

The abstract of the disclosure is objected to because the abstract merely recites claim language without providing a clear and concise summary of the technical disclosure of the invention.  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject matter. The analysis of claims 1 - 25, in accordance with these steps, follows. 

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine, manufacture, or composition of matter.
Claims 1 – 11 are directed to a method, meaning that it is directed to the statutory category of process. Claims 12 – 21 are directed to a computer program product, which is the statutory category of manufacture. Claims 22 – 23 are directed to a system, which can be an article of machine.  Claim 24 is directed to a method, which is directed to the statutory category of process. Claim 25 is directed to a computer program product, which is the statutory category of manufacture. 

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.

Regarding claim 1, the following claim elements are abstract ideas:
dividing … the input time series to a set of univariate time subseries; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
transforming … the set of univariate time subseries into a univariate prediction result series using a transformer model; (Transforming the set of univariate time subseries into a prediction series using a transformer model is merely mathematical calculation, which is mathematical concept.)
concatenating … the univariate prediction result series to a multivariate predictive result; (Concatenating univariate prediction series into multivariate predictive result is merely mathematical calculation, which is mathematical concept.)
The following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
, by a processor set, (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
receiving … an input time series from an external device in a first system; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)
outputting … the multivariate predictive result for providing time series forecasting to a second system. (This is mere data outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following additional elements:
wherein the external device comprises a smart sensor, the first system comprises a manufacturing system, and the second system comprises a planning system in communication with the first system. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following additional elements:
wherein the input time series comprises a multivariate time series. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 4, the rejection of claim 3 is incorporated herein. Further, claim 4 recites the following additional elements:
wherein the multivariate time series comprises a multi-channel signal. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following abstract idea:
wherein the transforming the set of univariate time subseries into the univariate prediction result series comprises normalizing and segmenting the univariate time subseries into patches. (Normalizing and segmenting subseries into patches is merely mathematical calculation, which is mathematical concept.)
Claim 5 does not recite additional elements. 

Regarding claim 6, the rejection of claim 5 is incorporated herein. Further, claim 6 recites the following additional elements:
wherein the patches are local and semantic information in aggregated time steps. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 7, the rejection of claim 5 is incorporated herein. Further, claim 7 recites the following abstract idea:
wherein the transforming the set of univariate time subseries into the univariate prediction result series further comprises transforming the patches into a representation. (Transforming patches into a representation is merely mathematical calculation, which is mathematical concept.)
Claim 7 does not recite additional elements. 

Regarding claim 8, the rejection of claim 7 is incorporated herein. Further, claim 8 recites the following additional elements:
wherein the transforming the set of univariate time series into the univariate prediction result series further comprises utilizing a flatten layer with a linear head on the representation to obtain the univariate prediction result series. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 9, the rejection of claim 1 is incorporated herein. Further, claim 9 recites the following additional elements:
wherein the univariate time subseries comprises a plurality of channel independent signals. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 10, the rejection of claim 9 is incorporated herein. Further, claim 10 recites the following additional elements:
wherein each of the channel independent signals have a same model weight as a weight of remaining channel independent signals. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 11, the rejection of claim 1 is incorporated herein. Further, claim 11 recites the following additional elements:
wherein the transformer model comprises a supervised model. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 12, the following claim element is additional element:
pre-train a transformer model using historically reconstructed masked patches; (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
The rest of claim 12 recites substantially similar subject matter to claim 1 respectively and is rejected with the same rationale, mutatis mutandis.

Claims 13 – 15 recite substantially similar subject matter to claims 2 – 4 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 16, the rejection of claim 15 is incorporated herein. Further, claim 16 recites the following abstract idea:
masking the univariate time subseries into masked patches and non-masked patches. (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
The rest of claim 16 recites substantially similar subject matter to claim 5 respectively and is rejected with the same rationale, mutatis mutandis.

Regarding claim 17, the rejection of claim 16 is incorporated herein. Further, claim 17 recites the following additional element:
wherein the transforming the set of univariate time subseries into the univariate prediction result series further comprises utilizing a linear layer on the non-masked patches to obtain the univariate prediction result series. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 18, the rejection of claim 17 is incorporated herein. Further, claim 18 recites the following additional element:
wherein the transforming the set of univariate time subseries into the univariate prediction results series further comprises reconstructing the masked patches. (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)

Claims 19 – 23 recite substantially similar subject matter to claims 9 – 12, and 16 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 24, the following claim elements are abstract ideas:
dividing … the univariate time series into patches; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
transforming … the patches into a representation using a transformer model; (Transforming the patches into a representation using a transformer model is merely mathematical calculation, which is mathematical concept.)
Claim 24 further recites following additional elements: 
, by a processor set, (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
receiving … a univariate time series; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)
obtaining … a univariate prediction result series by using a flatten layer with a linear head on the representation; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)
and outputting … the univariate prediction result series. (This is mere data outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)

Regarding claim 25, the following claim elements are abstract ideas:
divide the univariate time series into a non-overlapped set of patches; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
mask a subset of the non-overlapped set of patches to a masked patch series; (This is practical to perform in the human mind under its broadest reasonable interpretation aside from the recitation of generic computer components.)
transform the non-overlapped set of patches to a univariate prediction result series using the pre-trained transformer model; (Transforming the patches to prediction result series using a pre-trained transformer model is merely mathematical calculation, which is mathematical concept.)
Claim 25 further recites following additional elements: 
receive a univariate time series; (This is mere data gathering, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)
pre-train a transformer model using historically reconstructed masked patches; (This falls under mere instructions to apply abstract idea on a generic computer. See MPEP § 2106.05(f). Therefore, this does not amount to significantly more than the judicial exception.)
and output the univariate prediction result series. (This is mere data outputting, an insignificant extra solution activity, which does not integrate the judicial exception into a practical application. The broadest reasonable interpretation of this claim is storing information in memory, which is a well-understood, routine conventional activity. See MPEP § 2106.05(d)(II)(iv). Therefore, this does not amount to significantly more than the judicial exception.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3 – 12, 14 – 25 are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al. (NPL:” MTSMAE: Masked Autoencoders for Multivariate Time-Series Forecasting”) in view of Arik et al. (U.S. Pub. 2024/0249192 A1), further in view of Jawed et al. (NPL:” GQFormer: A Multi-Quantile Generative Transformer for Time Series Forecasting”).
	Regarding Claim 1, Tang teaches
transforming, by the processor set, the set of univariate time subseries into a univariate prediction result series using a transformer model; (Pg. 3 C. Encoder and Decoder section of Tang states “Our encoder is the encoder of transformer. In the pre-training, our encoder embeds only visible, unmasked patches through patch embedding, and then processes the output data through a series of transformer encoder blocks.” Pg. 2 III Methodology section of Tang states “The problem of multivariate time-series forecasting is to input the past sequence Xt= xt1,··· ,xtLx|xti∈Rdx at time t, and output the predict the corresponding future sequence Yt = yt1,··· ,ytLy|yti∈Rdy , where Lx and Ly are the lengths of input and output sequences respectively, and dx and dy are the feature dimensions of input X and output Y respectively. Our masked autoencoders (MTSMAE) is a simple autoencoding method and the training process is divided into two stages, as shown in the Fig. 1.” Since the multivariate forecast output in R3 is defined over feature dimensions, the predicted future values for any single feature constitute a univariate prediction result series. Thus, applying the transformer encoder/decoder to each univariate subseries yields the required univariate prediction series.)
and outputting, by the processor set, the multivariate predictive result for providing time series forecasting to a second system. (Pg. 2 III Methodology section of Tang states “The problem of multivariate time-series forecasting is to input the past sequence Xt= xt1,··· ,xtLx|xti∈Rdx at time t, and output the predict the corresponding future sequence Yt = yt1,··· ,ytLy|yti∈Rdy , where Lx and Ly are the lengths of input and output sequences respectively, and dx and dy are the feature dimensions of input X and output Y respectively. “ Tang outputs predicted future data and under BRI could be outputting the forecast result for downstream by another system.)
However, Tang does not explicitly teach 
A method, comprising: receiving, by a processor set, an input time series from an external device in a first system; 
dividing, by the processor set, the input time series to a set of univariate time subseries; 
concatenating, by the processor set, the univariate prediction result series to a multivariate predictive result; 
Arik teaches that 
A method, comprising: receiving, by a processor set, an input time series from an external device in a first system; ([0020] of Arik states “The system can receive input data 105, which can be represented according to any of a variety of data structures, including vectors, tables, matrices, tensors, and so on. The input data 105 can include multiple data points, each point corresponding to a point in time or time step.” [0043] of Arik states “The system receives one or more input data points, according to block 405. Each input data point corresponds to a respective past time step earlier in time than a current time step. The current time step can vary depending on the input data and/or the time at which the system receives the input data.”)
Jawed teaches that 
dividing, by the processor set, the input time series to a set of univariate time subseries; (Pg. 3 III Background section of Jawed states “We consider N related univariate time series data Y∈RT×N where each time series Yn∈RT is noted for a total of t=[1,..τ,...,T] timesteps. The variable τ is used to indicate The partitioning of the conditioning and the forecasting ranges.” Dividing the input timeseries Y into a set of univariate time subseries comprises extracting each channel/feature sequence Yn to obtain the set)
concatenating, by the processor set, the univariate prediction result series to a multivariate predictive result; (Pg. 3 III Background section of Jawed states “We consider N related univariate time series data Y∈RT×N where each time series Yn∈RT is noted for a total of t=[1,..τ,...,T] timesteps. The variable τ is used to indicate the partitioning of the conditioning and the forecasting ranges. In addition to the real-world time series we also consider C many social time covariates X ∈ RT×C that are observed in the entire range. We aim to model the following conditional distribution: p(Y n τ+1:T |Y n 1:τ ,X1:T , Θ) (1) This formulation in Eq. 1 explicitly models for multiple tasks jointly conditioned on the same input and model parameters Θ. This is in contrast to other works that reduce the problem complexity by formulating a simpler single step forecasting task p(Y nτ+1|Y n 1:τ ,X1:τ+1, Θ)4. Note that our formulation and following background is similar to [23].” Jawed represents the multivariate time series as N univariate series assembled as Y∈RT×N and further predicts a future trajectory per univariate series in Equation 1 Yn τ+1under a formulation that models for multiple tasks jointly. Accordingly, the set of per series predicted trajectories is assembled (i.e. concatenated across the N series dimension) to form the multivariate predictive result.)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Tang, Jawed, Arik. Tang teaches transformer based time series modeling using patch embedding, including masking a subset of patches, pre-training by reconstructing masked patches, and then using the transformer in fine-tuning to output prediction series. Arik teaches a forecasting workflow that receives past time step inputs and generates future time step predicted outputs, and further teaches weight sharing across features, supporting efficient forecasting when multiple channels/features are present. Jawed teaches a forecasting architecture for time series that includes applying a flatten operation to learned embeddings and using a shared linear head to produce forecast outputs. One with ordinary skill in the art would be motivated to incorporate the teachings of Jawed and Arik with Tang because these are structurally compatible design choice in transformer forecasting systems by enabling efficient pretraining, straightforward and widely used linear design, and ensuring consistent, scalable forecasting for multiple channels using past time series inputs. It would have been predictable combination to improve implementation and robustness of the model practically. 

Regarding Claim 3, the rejection of claim 1 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the input time series comprises a multivariate time series. (Pg. 2 III Methodology section of Tang states “The problem of multivariate time-series forecasting is to input the past sequence Xt= xt1,··· ,xtLx|xti∈Rdx at time t, and output the predict the corresponding future sequence Yt = yt1,··· ,ytLy|yti∈Rdy , where Lx and Ly are the lengths of input and output sequences respectively, and dx and dy are the feature dimensions of input X and output Y respectively. “ Input timeseries comprise multivariate timeseries while doing multivariate timeseries forecasting. )

Regarding Claim 4, the rejection of claim 3 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the multivariate time series comprises a multi-channel signal. (Pg. 2 III Methodology section of Tang states “The problem of multivariate time-series forecasting is to input the past sequence Xt= xt1,··· ,xtLx|xti∈Rdx at time t, and output the predict the corresponding future sequence Yt = yt1,··· ,ytLy|yti∈Rdy , where Lx and Ly are the lengths of input and output sequences respectively, and dx and dy are the feature dimensions of input X and output Y respectively. “ Multivariate with feature dimensions implies multiple channels/features, which is known standard interpretation.)

Regarding Claim 5, the rejection of claim 1 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the transforming the set of univariate time subseries into the univariate prediction result series comprises normalizing and segmenting the univariate time subseries into patches. ([0033] of Arik states “The system 100 receives input to the time mixing layer 302 and normalizes the input at a two-dimensional normalization layer (2D Norm) layer 310. At the 2D Norm layer 310, the system 100 normalizes over both time and feature dimensions of the input, to maintain a consistent scale between the time-mixing and feature-mixing operations at the later stages of the time mixing layer 302 and feature mixing layer 304, respectively.” Pg. 3 A. Patch embedding section of Tang states “The original MAE continues the idea of ViT, processing image data X∈ 2 RH*W*C, where (H;W) is the resolution of the original image, C is the number of channels. For MTSD, X  ∈ RLx*dx , the original patch embedding method is no longer applicable. Therefore, unlike the method of patch image data in ViT, we patch MTSD in the direction of time after embedding: Xh = Conv1d(X) ∈ RLx=Pdmodel (6) Xpte = Conv1d(Xh) ∈ RLx=P2dmodel (7) where the kernel width of one-dimensional convolutional filter and stride = P, Xpte is the final result of the patch embedding.” Arik teaches normalizing the input and Tang teaches segmenting the time series into patch tokens.)

Regarding Claim 6, the rejection of claim 5 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the patches are local and semantic information in aggregated time steps. (Pg. 4 C. Encoder and Decoder section of Tang states “In the pre-training, our MTSMAE reconstructs the input by recovering the specific value of each masking patch. Each element output by the decoder is a vector that can represent a patch. The last layer of the decoder is a linear projection, whose output channel is P D, P is the length of the patch, and D is the dimension of the time-series. In the fine-tuning, each element output by the decoder represents the data yti , and the output channel of the last layer of linear projection is D.” Patches have a defined patch length P where each patch aggregates P timesteps. These patch tokens further serve as the learned representation units (i.e. semantic feature) for the transformer encoder/decoder.)

Regarding Claim 7, the rejection of claim 5 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the transforming the set of univariate time subseries into the univariate prediction result series further comprises transforming the patches into a representation. (Pg. 2 III Methodology section of Tang states “Our masked autoencoders (MTSMAE) is a simple autoencoding method and the training process is divided into two stages, as shown in the Fig. 1. As all autoencoders, there is an encoder and a decoder in our method. The encoder maps the observed signal to a latent representation and the decoder reconstructs the original signal from the latent representation in the pre-training, or output Y in the fine-tuning.”)

Regarding Claim 8, the rejection of claim 7 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the transforming the set of univariate time series into the univariate prediction result series further comprises utilizing a flatten layer with a linear head on the representation to obtain the univariate prediction result series. (Pg. 5 D. Decoder section of Jawed states “ξFlat˜y = Flatten(ξ˜y) (14)ξ1:M˜y = Repeat(ξFlat˜y ,M) (15)qαi,τ : = [ξi ˜y ξαi ]WMTL + bMTL ∀i = [1, ...,M] (16) In the above equations, we first flatten the embedding of the time series to one feature axis, this results into the embedding size: (dmodel×len(1 : τ )), where dmodel indicates the embedded dimensionality of each timestep input. Next we repeat these M many times to combine these with the quantile embeddings in Eq. 10. Observe that each of the [1, ...M] quantile embedding is different, but the time series embedding ξFlat˜y remains the same. Finally, a shared fully connected layer, given by parameters WMTL ∈R(dmodel×len(1:τ)+dmodel)×len(τ+1:T),bMTL ∈ Rlen(τ+1:T) is learned to produce a quantile forecast based on the concatenated repeated representation of the time series and the embeddings of the implicit quantile levels.” Applies flatten to the learned representation and uses linear layer as the prediction head to output forecasts.)

Regarding Claim 9, the rejection of claim 1 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the univariate time subseries comprises a plurality of channel independent signals. (Pg. 3 III Background section of Jawed states “We consider N related univariate time series data Y∈RT×N where each time series Yn∈RT is noted for a total of t=[1,..τ,...,T] timesteps. The variable τ is used to indicate The partitioning of the conditioning and the forecasting ranges.”Jawed models N separate univariate series (i.e. channels) inside a multivariate structure.)

Regarding Claim 10, the rejection of claim 9 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein each of the channel independent signals have a same model weight as a weight of remaining channel independent signals. ([0028] of Arik states “In this specification, a “mixing layer” or “mixer layer” can refer to a layer of both time-domain and feature-domain operations. Additionally, a “time mixing layer” or “time mixer layer” can refer to a layer of time-domain operations, while a “feature mixing” or “feature mixer” layer can refer to a layer of feature-domain operations. Layers are collections of operations that at least partially depend on trainable weights or parameter values. Machine learning models may include different layers, such as fully-connected layers, dropout layers, etc.” [0006] of Arik states “The time series mixer includes MLPs that alternate between time-domain input and feature-domain input… Time-domain MLPs are reused or shared across all the features of an input time series, while feature-domain MLPs are reused or shared across all time steps of the input time series.” [0036] of Arik states “The time-mixing MLP 320 is shared across each feature of the transposed data 319. In other words, the system 100 processes values for each feature of the transposed data 319 through the time-mixing MLP 320.” Shared model components across all features mean same model weights are applied across channels/features.)

Regarding Claim 11, the rejection of claim 1 is incorporated herein.  Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the transformer model comprises a supervised model. (Pg. 1 Abstract of Tang states “In this paper, according to the data characteristics of multivariate time-series, a patch embedding method is proposed, and we present an self-supervised pre-training approach based on Masked Autoencoders (MAE), called MTSMAE, which can improve the performance significantly over supervised learning without pre-training.”)

Regarding Claim 12, the combination of Tang, Arik and Jawed teaches
pre-train a transformer model using historically reconstructed masked patches (Pg. 4 C. Encoder and Decoder section of Tang states “In the pre-training, we set up the decoder as MAE. The input to the decoder is a complete set including the visible patches output by encoder and mask tokens, where as the vector of learning, masked tokens are the data to be recovered… In the pre-training, our MTSMAE reconstructs the input by recovering the specific value of each masking patch.” Tang pre-trains via masked patch reconstruction and performs forecasting via transformer encoder and prediction decoder.)
The rest of claim 12 recites substantially similar subject matter to claim 1 respectively and is rejected with the same rationale, mutatis mutandis.

Regarding claim 14, the rejection of claim 12 is incorporated herein. Regarding claim 15, the rejection of claim 14 is incorporated herein. Claims 14 – 15 recite substantially similar subject matter as claims 3 – 4 respectively, and are rejected with the same rationale, mutatis mutandis.

Regarding claim 16, the rejection of claim 15 is incorporated herein. Furthermore, the combination of Tang, Arik and Jawed teaches
masking the univariate time subseries into masked patches and non-masked patches. (Pg. 3-4 C. Encoder and Decoder section of Tang states “Our encoder is the encoder of transformer. In the pretraining, our encoder embeds only visible, unmasked patches through patch embedding, and then processes the output data through a series of transformer encoder blocks. Our encoder only operates on a small part of the whole set, e.g., only 15%, which can greatly reduce the redundancy of information and increase the overall understanding of the model beyond low-level information. In the fine-tuning, our encoder can see all the patches… In the pre-training, we set up the decoder as MAE. The input to the decoder is a complete set including the visible patches output by encoder and mask tokens, where as the vector of learning, masked tokens are the data to be recovered.” Tang explicitly distinguishes unmasked patches vs masked patches (random masking) and reconstructs the masked ones in pretraining.)
The rest of claim 16 recites substantially similar subject matter to claim 5 respectively and is rejected with the same rationale, mutatis mutandis.

Regarding claim 17, the rejection of claim 16 is incorporated herein. Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the transforming the set of univariate time subseries into the univariate prediction result series further comprises utilizing a linear layer on the non-masked patches to obtain the univariate prediction result series. (Pg. 3-4 C. Encoder and Decoder section of Tang states “Our encoder is the encoder of transformer. In the pretraining, our encoder embeds only visible, unmasked patches through patch embedding, and then processes the output data through a series of transformer encoder blocks… In the pre-training, our MTSMAE reconstructs the input by recovering the specific value of each masking patch. Each element output by the decoder is a vector that can represent a patch. The last layer of the decoder is a linear projection, whose output channel is P D, P is the length of the patch, and D is the dimension of the time-series. In the fine-tuning, each element output by the decoder represents the data yti , and the output channel of the last layer of linear projection is D. Our loss function is calculated by the mean square error (MSE) between the model output data yo (recovery, prediction) and the real data y.” Tang feeds unmasked patches through the model and the decoder ends in a linear projection producing the prediction outputs.)

Regarding claim 18, the rejection of claim 17 is incorporated herein. Furthermore, the combination of Tang, Arik and Jawed teaches
wherein the transforming the set of univariate time subseries into the univariate prediction results series further comprises reconstructing the masked patches. (Pg. 4 C. Encoder and Decoder section of Tang states “In the pre-training, our MTSMAE reconstructs the input by recovering the specific value of each masking patch.”)

Regarding claim 19, the rejection of claim 12 is incorporated herein. Regarding claim 20, the rejection of claim 19 is incorporated herein. Claims 19 – 21 recite substantially similar subject matter as claims 9 – 11 respectively, and are rejected with the same rationale, mutatis mutandis.

Regarding claim 23, the rejection of claim 22 is incorporated herein. Claims 22 – 23 recite substantially similar subject matter as claims 12 and 16 respectively, and are rejected with the same rationale, mutatis mutandis.

Regarding Claim 24, the combination of Tang, Arik and Jawed teaches
receiving, by a processor set, a univariate time series; (Pg. 3 III Background section of Jawed states “We consider N related univariate time series data Y∈RT×N where each time series Yn∈RT is noted for a total of t=[1,..τ,...,T] timesteps. The variable τ is used to indicate The partitioning of the conditioning and the forecasting ranges.” [0020] of Arik states “The system can receive input data 105, which can be represented according to any of a variety of data structures, including vectors, tables, matrices, tensors, and so on. The input data 105 can include multiple data points, each point corresponding to a point in time or time step.” [0043] of Arik states “The system receives one or more input data points, according to block 405. Each input data point corresponds to a respective past time step earlier in time than a current time step. The current time step can vary depending on the input data and/or the time at which the system receives the input data.” Receiving timeseries comprised of univariate time series data.)
dividing, by the processor set, the univariate time series into patches; (Pg. 3 A. Patch embedding section of Tang states “The original MAE continues the idea of ViT, processing image data X∈ 2 RH*W*C, where (H;W) is the resolution of the original image, C is the number of channels. For MTSD, X  ∈ RLx*dx , the original patch embedding method is no longer applicable. Therefore, unlike the method of patch image data in ViT, we patch MTSD in the direction of time after embedding: Xh = Conv1d(X) ∈ RLx=Pdmodel (6) Xpte = Conv1d(Xh) ∈ RLx=P2dmodel (7) where the kernel width of one-dimensional convolutional filter and stride = P, Xpte is the final result of the patch embedding.” Dividing the series into time direction patches.)
transforming, by the processor set, the patches into a representation using a transformer model; (Pg. 2 III Methodology section of Tang states “Our masked autoencoders (MTSMAE) is a simple autoencoding method and the training process is divided into two stages, as shown in the Fig. 1. As all autoencoders, there is an encoder and a decoder in our method. The encoder maps the observed signal to a latent representation and the decoder reconstructs the original signal from the latent representation in the pre-training, or output Y in the fine-tuning.” Pg. 3 C. Encoder and Decoder section of Tang states “Our encoder is the encoder of transformer. In the pretraining, our encoder embeds only visible, unmasked patches through patch embedding, and then processes the output data through a series of transformer encoder blocks”)
obtaining, by the processor set, a univariate prediction result series by using a flatten layer with a linear head on the representation; (Pg. 5 D. Decoder section of Jawed states “ξFlat˜y = Flatten(ξ˜y) (14)ξ1:M˜y = Repeat(ξFlat˜y ,M) (15)qαi,τ : = [ξi ˜y ξαi ]WMTL + bMTL ∀i = [1, ...,M] (16) In the above equations, we first flatten the embedding of the time series to one feature axis, this results into the embedding size: (dmodel×len(1 : τ )), where dmodel indicates the embedded dimensionality of each timestep input. Next we repeat these M many times to combine these with the quantile embeddings in Eq. 10. Observe that each of the [1, ...M] quantile embedding is different, but the time series embedding ξFlat˜y remains the same. Finally, a shared fully connected layer, given by parameters WMTL ∈R(dmodel×len(1:τ)+dmodel)×len(τ+1:T),bMTL ∈ Rlen(τ+1:T) is learned to produce a quantile forecast based on the concatenated repeated representation of the time series and the embeddings of the implicit quantile levels.” Applies flatten to the learned representation and uses linear layer as the prediction head to output forecasts.)
and outputting, by the processor set, the univariate prediction result series. (Pg. 2 III Methodology section of Tang states “The problem of multivariate time-series forecasting is to input the past sequence Xt= xt1,··· ,xtLx|xti∈Rdx at time t, and output the predict the corresponding future sequence Yt = yt1,··· ,ytLy|yti∈Rdy , where Lx and Ly are the lengths of input and output sequences respectively, and dx and dy are the feature dimensions of input X and output Y respectively. “ [0044] of Arik states “In processing the one or more input data points as described herein, the system generates one or more output data points. Each output data point corresponds to a respective future time step later in time than the current time step, and each output data point including respective predicted values for one or more of the features at the respective future time step.”)

Regarding Claim 25, the combination of Tang, Arik and Jawed teaches
receive a univariate time series; (Pg. 3 III Background section of Jawed states “We consider N related univariate time series data Y∈RT×N where each time series Yn∈RT is noted for a total of t=[1,..τ,...,T] timesteps. The variable τ is used to indicate The partitioning of the conditioning and the forecasting ranges.” [0020] of Arik states “The system can receive input data 105, which can be represented according to any of a variety of data structures, including vectors, tables, matrices, tensors, and so on. The input data 105 can include multiple data points, each point corresponding to a point in time or time step.” [0043] of Arik states “The system receives one or more input data points, according to block 405. Each input data point corresponds to a respective past time step earlier in time than a current time step. The current time step can vary depending on the input data and/or the time at which the system receives the input data.” Receiving timeseries comprised of univariate time series data.)
divide the univariate time series into a non-overlapped set of patches; (Pg. 3 A. Patch embedding section of Tang states “The original MAE continues the idea of ViT, processing image data X∈ 2 RH*W*C, where (H;W) is the resolution of the original image, C is the number of channels. For MTSD, X  ∈ RLx*dx , the original patch embedding method is no longer applicable. Therefore, unlike the method of patch image data in ViT, we patch MTSD in the direction of time after embedding: Xh = Conv1d(X) ∈ RLx=Pdmodel (6) Xpte = Conv1d(Xh) ∈ RLx=P2dmodel (7) where the kernel width of one-dimensional convolutional filter and stride = P, Xpte is the final result of the patch embedding.” Patches are non-overlapped as stride = patch length where patch windows are exactly one patch each step.)
mask a subset of the non-overlapped set of patches to a masked patch series; (Pg. 2 III Methodology section of Tang states “Our masked autoencoders (MTSMAE) is a simple autoencoding method and the training process is divided into two stages, as shown in the Fig. 1. As all autoencoders, there is an encoder and a decoder in our method. The encoder maps the observed signal to a latent representation and the decoder reconstructs the original signal from the latent representation in the pre-training, or output Y in the fine-tuning.” Pg. 3 C. Encoder and Decoder section of Tang states “Our encoder is the encoder of transformer. In the pretraining, our encoder embeds only visible, unmasked patches through patch embedding, and then processes the output data through a series of transformer encoder blocks” Pg. 3 B. Model inputs section of Tang states “A random masking method is adopt, that is, the patches are randomly sampled without replacement, and follow the uniform distribution. The random sampling can tremendously remove the information redundancy of MSTD by deleting a large number of patches (i.e., high masking rate).”)
pre-train a transformer model using historically reconstructed masked patches; (Pg. 3-4 C. Encoder and Decoder section of Tang states “Our encoder is the encoder of transformer. In the pretraining, our encoder embeds only visible, unmasked patches through patch embedding, and then processes the output data through a series of transformer encoder blocks… In the pre-training, our MTSMAE reconstructs the input by recovering the specific value of each masking patch. Each element output by the decoder is a vector that can represent a patch. The last layer of the decoder is a linear projection, whose output channel is P D, P is the length of the patch, and D is the dimension of the time-series. In the fine-tuning, each element output by the decoder represents the data yti , and the output channel of the last layer of linear projection is D. Our loss function is calculated by the mean square error (MSE) between the model output data yo (recovery, prediction) and the real data y.”)
transform the non-overlapped set of patches to a univariate prediction result series using the pre-trained transformer model; (Pg. 4 C. Encoder and Decoder section of Tang states “In the pre-training, our MTSMAE reconstructs the input by recovering the specific value of each masking patch. Each element output by the decoder is a vector that can represent a patch. The last layer of the decoder is a linear projection, whose output channel is P D, P is the length of the patch, and D is the dimension of the time-series. In the fine-tuning, each element output by the decoder represents the data yti , and the output channel of the last layer of linear projection is D. Our loss function is calculated by the mean square error (MSE) between the model output data yo (recovery, prediction) and the real data y.” Tang states that pretrained transformer decoder produces prediction outputs in fine tuning)
and output the univariate prediction result series. (Pg. 2 III Methodology section of Tang states “The problem of multivariate time-series forecasting is to input the past sequence Xt= xt1,··· ,xtLx|xti∈Rdx at time t, and output the predict the corresponding future sequence Yt = yt1,··· ,ytLy|yti∈Rdy , where Lx and Ly are the lengths of input and output sequences respectively, and dx and dy are the feature dimensions of input X and output Y respectively. “ [0044] of Arik states “In processing the one or more input data points as described herein, the system generates one or more output data points. Each output data point corresponds to a respective future time step later in time than the current time step, and each output data point including respective predicted values for one or more of the features at the respective future time step.”)

Claims 2, 13 are rejected under 35 U.S.C. 103 as being unpatentable over Tang et al. (NPL:” MTSMAE: Masked Autoencoders for Multivariate Time-Series Forecasting”) in view of Arik et al. (U.S. Pub. 2024/0249192 A1), Jawed et al. (NPL:” GQFormer: A Multi-Quantile Generative Transformer for Time Series Forecasting”), further in view of Shabani et al. (U.S. Pub. 2023/0368002 A1).
	Regarding claim 2, the rejection of claim 1 is incorporated herein. Furthermore, the combination of Tang, Arik and Jawed does not explicitly teach 
wherein the external device comprises a smart sensor, the first system comprises a manufacturing system, and the second system comprises a planning system in communication with the first system.
	However, Shabani teaches that 
wherein the external device comprises a smart sensor, the first system comprises a manufacturing system, and the second system comprises a planning system in communication with the first system. ([0002] of Shabani states “Time Series Forecasting is among the most well-known problems in many domains such as sensor network monitoring, traffic and economics planning, astronomy, economic and financial forecasting, inventory planning, and weather and disease propagation forecasting.” It is conventional predictable application of the same forecasting pipeline.)
It would have been obvious to one with ordinary skill in the art before the effective filing date of the invention to combine the teachings of Shabani with the combination of Tang, Jawed, Arik. Tang teaches transformer based time series modeling using patch embedding, including masking a subset of patches, pre-training by reconstructing masked patches, and then using the transformer in fine-tuning to output prediction series. Arik teaches a forecasting workflow that receives past time step inputs and generates future time step predicted outputs, and further teaches weight sharing across features, supporting efficient forecasting when multiple channels/features are present. Jawed teaches a forecasting architecture for time series that includes applying a flatten operation to learned embeddings and using a shared linear head to produce forecast outputs. Shabani teaches complementary timeseries forecasting processing, such as handling multiple component series, to reinforce the forecasting implementation and provide various use of time series forecasting in different systems. One with ordinary skill in the art would be motivated to incorporate the teachings of Shabani with the combination of Tang, Jawed, Arik because it is adding known time series processing techniques to improve robustness and applicability across different time series applications. It would have been predictable combination to merely apply the combined system for compatibility across different platforms. 

Regarding claim 13, the rejection of claim 12 is incorporated herein. Claim 13 recites substantially similar subject matter as claim 2 respectively, and is rejected with the same rationale, mutatis mutandis.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BYUNGKWON HAN whose telephone number is (571)272-5294. The examiner can normally be reached M-F: 9:00AM-6PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BYUNGKWON HAN/               Examiner, Art Unit 2121                                                                                                                                                                                         

/Li B. Zhen/               Supervisory Patent Examiner, Art Unit 2121
Read full office action
TIME SERIES FORECASTING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

TIME SERIES FORECASTING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email