Office Action Analysis: 18200642 — HIERARCHY DRIVEN TIME SERIES FORECASTING

Office Action

§101 §103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


The term “the masked random patches” in claim 7 lacks antecedent basis as there is no prior reference to masked random patches in the claims.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


According to the first part of the analysis, in the instant case, claims 1-10 are directed to a method, claims 11-20 are directed to an apparatus. Each of these claims fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). 

Regarding claim 1
Step 2A Prong One
segmenting a time-series dataset from a plurality of sensors into a plurality of patches;
(This step for segmenting data is as understood as a mental process)
applying gated multilayer perceptron (MLP) mixing across different directions of the patched input time-series;
(This step for applying gated MLP mixing on data is understood as a mathematical concept)
capturing local and global and interrelated correlations across the plurality of patches and within the plurality of patches;
(This step for capturing correlations from the data is understood as a mental process)
and applying a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level.
(This step for applying a patch-time aggregated hierarchy is understood as a mental process)
Step 2A Prong Two
A computer-implemented method comprising: 
(This step for performing the methods on a generic computer is considered mere instructions to apply an exception. See MPEP § 2106.05(f) )
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as segmenting data, capturing correlations, applying a hierarchy, and mathematical concepts such as applying gated MLP mixing on data while the additional elements of performing the methods using a generic computer is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


	Regarding claim 2;
	Step 2A Prong One
(Claim 2 depends on claim 1, which has been determined to recite abstract ideas including mental processes and mathematical concepts. Therefore, claim 2 also recites abstract ideas)
Step 2A Prong Two 
The computer-implemented method of claim 1, wherein the MLP mixing is channel independent.
This additional element of channel independent MLP mixing integrates the mathematical concept of “applying gated multilayer perceptron (MLP) mixing across different directions of the patched input time-series and the mental process of capturing local and global and interrelated correlations across the plurality of patches and within the plurality of patches” as it improves the functionality of the method ([0030] “Mixing in a channel independent way across different directions means alternating with respect to patches and features. By mixing in this manner, aspects of this disclosure are able to capture local and global interrelated correlations across and within patches and features.”)
However, the remaining mental processes of “segmenting a time-series dataset from a plurality of sensors into a plurality of patches” and “applying a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level introduced in claim 1” are still not integrated into a practical application still present an issue.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the remaining judicial exceptions because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes without any technological improvement or inventive step. 


Regarding claim 3,
Step 2A Prong One
(Claim 3 depends on claim 1, which has been determined to recite abstract ideas including mental processes and mathematical concepts. Therefore, claim 3 also recites abstract ideas)
Step 2A Prong Two
The computer-implemented method of claim 1, wherein the MLP mixing uses layers that are stacked in linear fashion.
(This step for MLP mixing using stacked layers in a linear fashion is simply limiting how the abstract idea is performed with no technological improvement)
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of stacking layers of MLP mixing in a linear fashion at a high level of generality is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


Regarding claim 4,
Step 2A Prong One
(Claim 4 depends on claim 1, which has been determined to recite abstract ideas including mental processes and mathematical concepts. Therefore, claim 4 also recites abstract ideas)
Step 2A Prong Two
The computer-implemented method of claim 1, wherein the MLP mixing uses layers that are chained in a patch length context aware hierarchy fashion. 
This step for MLP mixing using chained layers in a context aware fashion does integrate the mathematical concept of gated MLP mixing recited in claim 1, as it improves the functionality of the system ([0032] “The method further includes chaining MLP-mixers in a patch length context aware hierarchy fashion to enhance time-series short and long-term correlation capture.”
However, the remaining mental processes of “segmenting a time-series dataset from a plurality of sensors into a plurality of patches capturing local” and “global and interrelated correlations across the plurality of patches and within the plurality of patches and applying a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level” still present an issue.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the remaining judicial exceptions because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes without any technological improvement or inventive step.


Regarding claim 5,
Step 2A Prong One
(Claim 5 depends on claim 1, which has been determined to recite abstract ideas including mental processes and mathematical concepts. Therefore, claim 5 also recites abstract ideas)
Step 2A Prong Two
The computer-implemented method of claim 1, wherein the MLP mixing is mixed with respect to patches and features. 
(This step for MLP mixing with respect to patches and features is simply limiting the implementation of abstract idea to a particular type of data without any improvement)
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional element of mixing with respect to features and patches at a high level of generality is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


Regarding claim 6,
Step 2A Prong One
(Claim 6 depends on claim 1, which has been determined to recite abstract ideas including mental processes and mathematical concepts. Therefore, claim 6 also recites abstract ideas)
Step 2A Prong Two
The computer-implemented method of claim 1, further comprising a pretraining task of masking random patches. 
(This step for masking random samples before training is considered extra solution activity. See MPEP § 2106.05(g) )
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes and mathematical processes while the additional element of masking random samples is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


Regarding claim 7,
Step 2A Prong One
(Claim 7 depends on claim 4, which has been determined to recite abstract ideas including mental processes. Therefore, claim 7 also recites abstract ideas)
Step 2A Prong Two
The computer-implemented method of claim 4, further comprising reconstructing the masked random patches. 
(This step for reconstructing masked samples before training is considered extra solution activity. See MPEP § 2106.05(g) )
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes and mathematical processes while the additional element of reconstructing masked samples is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


Regarding claim 8,
Step 2A Prong One
The computer-implemented method of claim 1, further comprising a downstream task of forecasting values of the sensors.
(This step for forecasting values of sensors is considered a mental process)
Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as forecasting sensor values and mathematical processes without any technological improvement or inventive step. 


Regarding claim 9,
Step 2A Prong One
The computer-implemented method of claim 1, further comprising a downstream task of executing regression analysis regarding values of the sensors. 
(This step for executing regression analysis is considered a mathematical concept)
Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes and mathematical processes without any technological improvement or inventive step. 

Regarding claim 10,
Step 2A Prong One
The computer-implemented method of claim 1, further comprising a downstream task of classifying values of the sensors into one of a variety of predetermined classifications. 
(This step for classifying values is considered a mental process)
Step 2A Prong Two
The claim does not include additional elements, when considered separately and in combination, that integrate the judicial exception into a practical application.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes and mathematical processes without any technological improvement or inventive step. 


Regarding claim 11
Step 2A Prong One
Segment a time-series dataset from a plurality of sensors into a plurality of patches; 
(This step for segmenting data is as understood as a mental process)
apply gated multilayer perceptron (MLP) mixing across different directions of the patched input time-series; 
(This step for applying gated MLP mixing on data is understood as a mathematical concept)
capture local and global and interrelated correlations across the plurality of patches and within the plurality of patches; 
(This step for capturing correlations from the data is understood as a mental process)
and apply a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level. 
(This step for applying a patch-time aggregated hierarchy is understood as a mental process)
Step 2A Prong Two
A system comprising: a processor; and a memory in communication with the processor, the memory containing instructions that, when executed by the processor, cause the processor to: 
(This step for applying abstract ideas using a generic computer is mere instructions to apply an exception. See MPEP § 2106.05(f) )
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as segmenting data, capturing correlations, applying a hierarchy, and mathematical concepts such as applying gated MLP mixing on data while the additional elements of performing methods using a generic computer is well-understood, routine, and conventional activity as recognized by the court decisions listed in MPEP § 2106.05(d).


	Claim 12 is an apparatus claim directly corresponding to claim 2, and is likewise deficient.

	Claim 13 is an apparatus claim directly corresponding to claim 5, and is likewise deficient.


Regarding claim 14,
Step 2A Prong One
(Claim 14 depends on claim 11, which has been determined to recite abstract ideas including mental processes and mathematical concepts. Therefore, claim 3 also recites abstract ideas)
Step 2A Prong Two
The system of claim 11, wherein the MLP mixing uses layers that are either stacked in linear fashion 
(This step for MLP mixing using stacked layers in a linear fashion is simply limiting how the abstract idea is performed with no technological improvement)
or chained in a patch length context aware hierarchy fashion.
This step for MLP mixing using chained layers in a context aware fashion does integrate the mathematical concept of gated MLP mixing recited in claim 1, as it improves the functionality of the system ([0032] “The method further includes chaining MLP-mixers in a patch length context aware hierarchy fashion to enhance time-series short and long-term correlation capture.”
However, the remaining mental processes of segmenting a time-series dataset from a plurality of sensors into a plurality of patches capturing local and global and interrelated correlations across the plurality of patches and within the plurality of patches and applying a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level still present an issue.
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the remaining judicial exceptions because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes while the additional elements of stacking layers of MLP mixing in a linear fashion at a high level of generality is a well-understood, routine, and conventional activity, as recognized by the court decisions listed in MPEP § 2106.05(d).


Claim 15 is an apparatus claim directly corresponding to claim 6, and is likewise deficient.

Claim 16 is an apparatus claim directly corresponding to claim 7, and is likewise deficient.

Claim 17 is an apparatus claim directly corresponding to claim 8, and is likewise deficient.

Claim 18 is an apparatus claim directly corresponding to claim 9, and is likewise deficient.

Claim 19 is an apparatus claim directly corresponding to claim 10, and is likewise deficient.


Regarding claim 20
Step 2A Prong One
segment a time-series dataset from a plurality of sensors into a plurality of patches; 
(This step for segmenting data is as understood as a mental process)
apply gated multilayer perceptron (MLP) mixing across different directions of the patched input time-series; 
(This step for applying gated MLP mixing on data is understood as a mathematical concept)
capture local and global and interrelated correlations across the plurality of patches and within the plurality of patches; 
(This step for capturing correlations from the data is understood as a mental process)
and apply a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level. 
(This step for applying a patch-time aggregated hierarchy is understood as a mental process)
Step 2A Prong Two
A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to: 
(This step for applying abstract ideas using a generic computer is mere instructions to apply an exception. See MPEP § 2106.05(f) )
Step 2B
	The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered individually and in combination, they do not add significantly more (also known as an inventive concept) to the exception. The claim recites mental processes such as segmenting data, capturing correlations, applying a hierarchy, and mathematical concepts such as applying gated MLP mixing on data while the additional elements of performing methods using a generic computer is well-understood, routine, and conventional activity as recognized by the court decisions listed in MPEP § 2106.05(d).

 


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 4, 8-9, 11, 14, 18, 17, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhengzhong Tu et al. (hereinafter Tu) (“MAXIM: Multi-Axis MLP for Image Processing”, 04/09/2022) in view of Maja Rudolph et al. (hereinafter Maja) (JP 2023010698 A, 01/20/2023) further in view of Davide Burba et al. (hereinafter Burba) (“A Trainable Reconciliation Method for Hierarchical Time-Series”, 01/05/2021).


Regarding claim 1, Tu teaches; 
applying gated multilayer perceptron (MLP) mixing across different directions of the patched input 

    PNG
    media_image1.png
    509
    1451
    media_image1.png
    Greyscale

([pg. 4] Figure 3. Multi-axis gated MLP block (best viewed in color). The input is first projected to a [6; 4;C] feature, then split into two heads. In the local branch, the half head is blocked into 32 non-overlapping [2; 2;C=2] patches, while we grid the other half using a 2x2 grid in the global branch. We only apply the gMLP block [50] (illustrated in the right gMLP Block) on a single axis of each branch - the 2nd axis for the local branch and the 1st axis for the global branch, while shared along the other spatial dimensions. The gMLP operators, which run in parallel, correspond to local and global (dilated) attended regions, as illustrated with different colors (i.e., the same color are spatially mixed using the gMLP operator). Our proposed block expresses both global and local receptive fields on arbitrary input resolutions.)
	NOTE: Teaches applying gated MLP mixing (mixed using gMLP, which is gated MLP) across different directions of the input (1st and 2nd axis, as shown in fig. 3 above). It would be obvious for this patched input to be patched time-series data, further explained below.
capturing local and global and interrelated correlations across the plurality of patches and within the plurality of patches;

    PNG
    media_image2.png
    520
    1454
    media_image2.png
    Greyscale
	([pg.4, fig. 3] Multi-axis gated MLP block (best viewed in color). The input is first projected to a [6; 4;C] feature, then split into two heads. In the local branch, the half head is blocked into 3 2 non-overlapping [2; 2;C=2] patches, while we grid the other half using a 2 2 grid in the global branch. We only apply the gMLP block [50] (illustrated in the right gMLP Block) on a single axis of each branch - the 2nd axis for the local branch and the 1st axis for the global branch, while shared along the other spatial dimensions. The gMLP operators, which run in parallel, correspond to local and global (dilated) attended regions, as illustrated with different colors (i.e., the same color are spatially mixed using the gMLP operator). Our proposed block expresses both global and local receptive fields on arbitrary input resolutions.)
NOTE: The local branch restricts mixing to local windows while the global branch mixes across global spatial areas (across and within the plurality of patches), the two outputs are then concatenated {see fig. 3}, capturing local and global interrelations respectively.
Tu fails to teach but Maja teaches;
segmenting a time-series dataset from a plurality of sensors into a plurality of patches;
([Abstract] provide an anomalous region detection method and system of a time series in a machine learning system via local neural transformation. SOLUTION: A anomalous region detection method includes: receiving time series data being grouped in patches;)
NOTE: Teaches segmenting a time-series dataset into a plurality of patches.
([pg. 2] The technology disclosed in the present application can be applied to time-series imaging using other sensors such as radio electromagnetic wave antennas and sound collecting microphones, for example.)
NOTE: The aforementioned dataset can be derived from a plurality of sensors.
OBVIOUSNESS TO COMBINE MAJA WITH TU:
Maja and Tu are both analogous art to the present disclosure as they both involve MLP. Specifically, Maja discloses a system which patches time-series data from sensors and utilizes MLP while Tu discloses a process using gated MLP mixing.
Maja additionally states;
([pg. 2] Images may be multi-dimensional in that they may include components of time, space, intensity, intensity, or other properties. For example, the images may include time series images. This technique can also be extended to image 3Dacoustic sources or objects.) 
NOTE: Maja discloses that adding a time dimension to image data is a known technique to allow the data to represent patterns over time. The disclosure of Tu pertains to image data. 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to collect and segment data from a plurality of sensors with a time series component into patches (as taught by Maja) to use as input for a system utilizing gated MLP mixing (taught by Tu) to capture temporal trends
Maja and Tu fail to teach but Burba teaches; 
and applying a patch-time aggregated hierarchy to guide lowest-level predictions based on aggregated hierarchy signals at a patch-level.
	
    PNG
    media_image3.png
    562
    629
    media_image3.png
    Greyscale

	NOTE: Fig 1 (above) Shows the time series hierarchy, where each level is an aggregate of the time-series segments from the layer below it. The time series segments are considered patches of the time series data. [teaches patch time aggregated hierarchy]

    PNG
    media_image4.png
    364
    939
    media_image4.png
    Greyscale

	([pg.4, section 3] The encoder maps the input predictions to the bottom level reconciled predictions, and the decoder takes the latter as input and reconstructs the predictions at all levels. A representation of our method is given in Figure 2. {see fig 2 above})
	NOTE: The encoder maps all predictions to bottom level reconciled predictions, which is used to reconstruct predictions at ALL levels. Therefore, teaches guiding lowest level predictions (reconstructs all levels) based on aggregated hierarchy signals at a patch level (each patch-level / level contributes to the reconstructed reconciled predictions)
	OBVIOUSNESS TO COMBINE BURBA WITH TU AND MAJA:
	Burba is analogous art to Tu, Maja and the present invention as they all pertain to machine learning systems. Burba specifically relates to time series forecasting using an aggregated hierarchy structure with reconciliation. 
	Additionally, Burba further states;
	([pg.2, paragraph 2] In this work, we propose a new exact methodology to reconcile hierarchical time-series forecasts based on an encoder-decoder neural network. The encoder is a trainable neural network that takes as input the independent forecasts and outputs the bottom-level reconciled forecasts. The decoder is a fixed matrix which reconstructs exactly the forecasts at all levels using the bottom level encoded predictions. Our method includes and generalizes the representation space of existing methods, is extremely flexible, and is easy to implement. We apply it to four real-world datasets, and we show that it consistently achieves a better or equal performance than the existing reconciliation methods.)
	NOTE: This excerpt details that this aggregated hierarchy structure is extremely flexible, easy to implement, and achieves better or equal performance to existing methods. Their method also allows for forecasting at different granularity levels of the time-series data, which is a common problem for forecasting time series data [see introduction of Burba].
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the aggregated hierarchy structure from Burba into the system described in the present disclosure to allow for flexible, accurate, efficient time-series forecasting at different granularity levels. 


	Regarding claim 4, Tu in view of Maja and Burba teach;
The computer-implemented method of claim 1,
Tu and Maja fail to teach but Burba teaches;
wherein the MLP mixing uses layers that are chained in a patch length context aware hierarchy fashion.

    PNG
    media_image3.png
    562
    629
    media_image3.png
    Greyscale

	NOTE: Burba teaches layers that are chained in a patch length context aware hierarchy fashion (the length of each time-series sub-interval / patch at each layer of the hierarchy is determined by length of patches in the previous layer). 
	OBVIOUSNESS TO COMBINE HIERARCHY WITH MLP MIXING:
([pg.1, section 1] A hierarchical time-series is a collection of time-varying observations organized in a hierarchical structure. The problem of forecasting hierarchical time-series often appears in business and economics, where time-varying quantities need to be predicted at different granularity levels.)
NOTE: Representing time-series data using layers that are chained in a patch length context aware hierarchy fashion allows for time-varying quantities to be predicted at different granularity levels, which is a common need for this type of data [see introduction of Burba]. 
Hierarchical time-series forecasting commonly requires capturing dependencies at different temporal scales, and applying mixing operations across representations generated at each hierarchical level allows information from multiple patches to interact, thereby improving context awareness and predictive performance. Such a combination represents a predictable use of known network design techniques that combines hierarchical feature representation with mixing layers.
	Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, for the MLP mixing of claim 1 (taught by Tu, Maja) to be applied within the hierarchical patch structure of Burba to improve modeling of temporal relationships across multiple granularities. 


	Regarding claim 8, Tu in view of Maja and Burba teaches;
	The computer-implemented method of claim 1,
	Tu fails to teach but Maja teaches;
	Values of the sensors
	[Using the same teaching from claim 1]
	Tu and Maja fail to teach but Burba teaches;
further comprising a downstream task of forecasting values .
([pg.5, section 4.1] For the datasets which present a high number of time-series (> 500), we estimate a single global predictive model to forecast all the time-series. This approach allows us to exploit the time-series similarities and estimating a more complex model [11]. We consider the Light Gradient Boosting (LightGBM) model, taking as input the scaled lagged values, time-series specific features (such as the level in the hierarchy), and temporal features.)
NOTE: Teaches a downstream task of forecasting time-series values. It would be obvious for the values used for this task to be obtained from the sensors taught by Maja as both Maja and Burba involve time-series data.


Regarding claim 9, Tu in view of Maja further in view of Burba teaches; 
The computer-implemented method of claim 1,
	Tu fails to teach but Maja teaches;
	Values of the sensors
	[Using the same teaching from claim 1]
	Tu and Maja fail to teach but Burba teaches;
further comprising a downstream task of executing regression analysis regarding values of the sensors.
([pg.5, section 4.1] To train the forecasting models we consider two alternative strategies, i.e. estimating an individual model for each time-series, or a global model for all of them. For the individual strategy, we consider linear autoregressive models (AR(p)), taking the lagged time-series values as input.)
NOTE: One implementation comprises a downstream task of executing regression analysis (autoregressive models) using the time-series values as input. It would be obvious for the values used for this task to be obtained from the sensors taught by Maja as both Maja and Burba involve time-series data.


Claim 11 is an apparatus claim directly corresponding to method claim 1 except with an additional limitation, which is still taught by Maja;
A system comprising: a processor; and a memory in communication with the processor, the memory containing instructions that, when executed by the processor, cause the processor to:

    PNG
    media_image5.png
    516
    409
    media_image5.png
    Greyscale

	([pg. 6] FIG. 5 is a block diagram of an electronic computer system suitable for implementing the systems disclosed herein or for performing the methods disclosed herein.)
	NOTE: Teaches a system comprising: a processor; and a memory in communication with the processor, the memory containing instructions that, when executed by the processor, cause the processor to perform the methods of the disclosure.


Claim 14 is an apparatus claim directly corresponding to claim 4, is therefore rejected for the same reasons. 


Claim 17 is an apparatus claim directly corresponding to claim 8, is therefore rejected for the same reasons. 


Claim 18 is an apparatus claim directly corresponding to claim 9, is therefore rejected for the same reasons. 

Claim 20 is an apparatus claim directly corresponding to method claim 1 except with an additional limitation, which is still taught by Maja;
A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to:
([pg. 15] Program code embodying the algorithms and/or methodologies described herein may be distributed separately or collectively in a variety of different forms of program products. This program code may be distributed using a computer-readable storage medium having thereon computer-readable program instructions for causing a processor to perform aspects of one or more embodiments. Computer-readable storage media that are non-transitory in nature include both volatile and nonvolatile implemented by any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.)
NOTE: Teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform the methods of the disclosure. 



Claim(s) 2-3, 5, 12, 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu  (“MAXIM: Multi-Axis MLP for Image Processing”, 04/09/2022) in view of Maja (JP 2023010698 A, 01/20/2023) further in view of Burba (“A Trainable Reconciliation Method for Hierarchical Time-Series”, 01/05/2021), further in view of Huang Jinmiao et al. (hereinafter Huang) (US 20230335118 A1, 10/19/2023).


Regarding claim 2, Tu in view of Maja and Burba teach;
The computer-implemented method of claim 1,
(Using the same reasoning for the rejection for claim 1)
Tu, Maja, and Burba fail to teach but Huang teaches;
The computer-implemented method of claim 1, wherein the MLP mixing is channel independent. 
(channel independent means alternating with respect to patches and features, as defined in the spec [0032])
([0094] The basic idea behind MLP-Mixer is to use multiple layers of mixer blocks, each consisting of two separate operations: channel mixing and spatial mixing. In the channel mixing operation, a multi-layer perceptron (MLP) is applied to the channels of each patch independently, allowing for non-linear transformations of the patch features. In the spatial mixing operation, a global average pooling is performed over the patch dimensions, followed by another MLP applied to the resulting global feature map.)
NOTE: Teaches channel independent MLP mixing (multiple layers which alternate between mixing with respect to features and patches)
OBVIOUSNESS TO COMBINE HUANG WITH TU, MAJA, AND BURBA:
Huang is analogous to Tu, Maja, Burba, and the present disclosure as they all pertain to methods of machine learning. Specifically, Huang pertains to a MLP mixer based encoding model. 
Additionally, Huang states;
([0095] By alternating these two operations, MLP-Mixer can capture both local and global features of the image, while also allowing for non-linear transformations of the feature representations.)
NOTE: Discloses that channel independent MLP mixing allows the model to capture both local and global features of the image, while also allowing for non-linear transformations of the feature representations.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to make the MLP mixing of the system of claim 1 (taught by Tu in view of Maja and Burba) to be channel independent to allow the model to capture both local and global features of the image, while also allowing for non-linear transformations of the feature representations.


Regarding claim 3, Tu in view of Maja and Burba teach;
The computer-implemented method of claim 1,
(Using the same reasoning for the rejection for claim 1)
Tu, Maja, and Burba fail to teach but Huang teaches;
wherein the MLP mixing uses layers that are stacked in linear fashion.

    PNG
    media_image6.png
    719
    757
    media_image6.png
    Greyscale

	NOTE: In fig. 6A {above} the MLP mixing uses layers that are stacked in a linear fashion (feature mixing layer -> time mixing layer -> etc.)
Additionally, stacking mixing layers in a linear fashion would allow the system to alternate between different mixing modes, allowing the mixing to generate insights across different dimensions of the data.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to make the MLP mixing of the system of claim 1 (taught by Tu in view of Maja and Burba) stack the MLP mixing layers in a linear fashion to allow the system to alternate between different mixing modes and generate insights across different dimensions of the data.


Regarding claim 5, Tu in view of Maja and Burba teach;
The computer-implemented method of claim 1,
(Using the same reasoning for the rejection for claim 1)
Tu, Maja, and Burba fail to teach but Huang teaches;
wherein the MLP mixing is mixed with respect to patches and features.
([0094] The basic idea behind MLP-Mixer is to use multiple layers of mixer blocks, each consisting of two separate operations: channel mixing and spatial mixing. In the channel mixing operation, a multi-layer perceptron (MLP) is applied to the channels of each patch independently, allowing for non-linear transformations of the patch features. In the spatial mixing operation, a global average pooling is performed over the patch dimensions, followed by another MLP applied to the resulting global feature map.)
NOTE: Teaches mixing with respect to patches (spatial mixing mixes with respect to patches) and features (channel mixing mixes with respect to patch features).
Additionally, Huang states;
([0095] By alternating these two operations, MLP-Mixer can capture both local and global features of the image, while also allowing for non-linear transformations of the feature representations.)
NOTE: Discloses that mixing with respect to patches and features allows the model to capture both local and global features of the image, while also allowing for non-linear transformations of the feature representations.
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to make the MLP mixing of the system of claim 1 (taught by Tu in view of Maja and Burba) mix with respect to patches and features to allow the model to capture both local and global features of the image, while also allowing for non-linear transformations of the feature representations.


Claim 12 is an apparatus claim directly corresponding to claim 2, is therefore rejected for the same reasons. 


Claim 13 is an apparatus claim directly corresponding to claim 5, is therefore rejected for the same reasons. 




Claim(s) 6, 7, 15-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu (“MAXIM: Multi-Axis MLP for Image Processing”, 04/09/2022) in view of Maja (JP 2023010698 A, 01/20/2023) further in view of Burba (“A Trainable Reconciliation Method for Hierarchical Time-Series”, 01/05/2021) further in view of Krishna Kumar Singh et al. (hereinafter Krishna) (“Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization”, 12/23/2017).


Regarding claim 6, Tu in view of Maja and Burba teach;
The computer implemented method of claim 1,
(Using the same reason as the rejection for claim 1)
Tu, Maja, and Burba fail to teach but Krishna teaches;
further comprising a pretraining task of masking random patches.
([Abstract] Our key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden.)
NOTE: Teaches randomly masking patches.
OBVIOUS TO COMBINE KRISHNA WITH TU, MAJA, AND BURBA:
Krishna is analogous art to Tu, Maja, and Burba, and the present disclosures as it pertains to machine learning architectures. Particularly it pertains to masking random patches of image inputs for a network. 
Krishna further states:
([pg.1, fig.1 caption] Main idea. (Top row) A network tends to focus on the most discriminative parts of an image (e.g., face of the dog) for classification. (Bottom row) By hiding images patches randomly, we can force the network to focus on other relevant object parts in order to correctly classify the image as ’dog’.)
NOTE: This excerpt discloses that random masking allows the network to accurately predict even when data is missing.
Additionally, the models of Tu and Maja detail MLP models using image data as input. Simply masking the image data using the process disclosed by Krishna before processing would be a simple combination. 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate random masking (as taught by Krishna) into the method of claim 1 (taught by Tu, Maja, and Burba) to allow the system to perform well even when data is missing. 


Regarding claim 7, Tu in view of Maja and Burba teach; 
The computer-implemented method of claim 1,
Tu, Maja, and Burba fail to teach but Krishna teaches;
further comprising reconstructing the masked random patches.
([pg.4, paragraph 2] We hide patches only during training. During testing, the full image—without any patches hidden—is given as input to the network;)
NOTE: Teaches reconstructing (by revealing the masked patches in the original image) the masked random patches.
Krishna additionally states;
([pg. 4, paragraph 2] Since the network has learned to focus on multiple relevant parts during training, it is not necessary to hide any patches during testing.) 
NOTE: There is no reason for the patches to be masked during testing, so it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to reconstruct the masked patches during testing to accurately test the model. 


Claim 15 is an apparatus claim directly corresponding to claim 6, is therefore rejected for the same reasons. 


Claim 16 is an apparatus claim directly corresponding to claim 7, is therefore rejected for the same reasons. 


Claim(s) 10, 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Tu (“MAXIM: Multi-Axis MLP for Image Processing”, 04/09/2022) in view of Maja (JP 2023010698 A, 01/20/2023) further in Burba (“A Trainable Reconciliation Method for Hierarchical Time-Series”, 01/05/2021) further in view of Akhil Tandon et al. (hereinafter Akhil) (“US 20220414072 A1”, 12/29/2022).


Regarding claim 10, Tu in view of Maja and Burba teach 
The computer implemented method of claim 1,
(Using the same reasoning from claim 1)
Tu fails to teach but Maja teaches;
Values of the sensors
(Using the same reasoning from claim 1)
Tu, Maja, and Burba fail to teach but Akhil teaches;
further comprising a downstream task of classifying values 
([Abstract] Embodiments of the present invention provide a computer system, a computer program product, and a method that comprises identifying a plurality of data logs; generating a data model using analyzed time series data from the identified data logs; detecting anomalies within the generated data model; constructing a causal graph using the detected anomalies and retrieved domain knowledge; computing a severity value for the detected anomalies with the constructed causal graph; assigning the detected anomaly to a classification based on a function vector,)
NOTE: Teaches a downstream task of classifying values.
([0008] Embodiments of the present invention group detected anomalies based on predetermined classifications, and when a new anomaly is detected, embodiments of the present invention assign the newly detected anomaly to an existing classification.)
NOTE: Teaches a downstream task of classifying values into one of a variety of predetermined classifications.
OBVIOUSNESS TO COMBINE AKHIL WITH TU, MAJA, AND BURBA:
Akhil is analogous art to Tu, Maja, Burba, and the present disclosure as Akhil pertains to data analysis and machine learning. Specifically, Akhil pertains to analyzing time-series data with a downstream task of classifying values. 
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the time-series data analysis method of claim 1 with a downstream classification task (as taught by Akhil), as it is well known in the art to utilize learned temporal representations for categorizing signals into predefined classes for decision making purposes. Such combination represents a routine application of time-series feature extraction techniques to supervised learning. It would further be obvious for the time-series values used in this classification to be values of the sensors taught by Maja.

Claim 19 is an apparatus claim directly corresponding to method claim 10, and is therefore rejected for the same reasons as claim 10.

			   	    CONCLUSION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Matthew Alan Cady whose telephone number is (571) 272-7229. The examiner can normally be reached Monday - Friday, 7:30 am - 5:00 pm ET. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cesar Paula can be reached on (571)272-4128. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 
at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/MATTHEW ALAN CADY/ Examiner, Art Unit 2145 


/CESAR B PAULA/            Supervisory Patent Examiner, Art Unit 2145
Read full office action
HIERARCHY DRIVEN TIME SERIES FORECASTING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

HIERARCHY DRIVEN TIME SERIES FORECASTING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email