Last updated: April 19, 2026
Application No. 17/677,159
System, Method, and Computer Program Product for Time Series Based Machine Learning Model Reduction Strategy

Final Rejection §101§103
Filed
Feb 22, 2022
Examiner
SMITH, KEVIN LEE
Art Unit
2122
Tech Center
2100 — Computer Architecture & Software
Assignee
VISA INTERNATIONAL SERVICE ASSOCIATION
OA Round
2 (Final)
Interview Optional

— +18.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 134 resolved cases, 2023–2026
Examiner Intelligence

SMITH, KEVIN LEE View full profile →
Grants only 37% of cases
Career Allow Rate
49 granted / 134 resolved
-18.4% vs TC avg
Strong +18% interview lift
Without
With
+18.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 8m
Avg Prosecution
45 currently pending
Career history
179
Total Applications
across all art units
Statute-Specific Performance

§101
30.7%
-9.3% vs TC avg
§103
36.4%
-3.6% vs TC avg
§102
10.1%
-29.9% vs TC avg
§112
17.3%
-22.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 134 resolved cases
Office Action

§101 §103
DETAILED ACTION
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This communication is in response to the Applicant’s submission filed 22 February 2022, where:
Claims 1, 3, 8, 10, 15, and 17 have been amended.
Claims 2, 5, 9, 12, 16, and 19 have been cancelled.
Claims 1, 3, 4, 6-8, 10, 11, 13-15, 17, 18, and 20 are pending.
Claims 1, 3, 4, 6-8, 10, 11, 13-15, 17, 18, and 20 are rejected.
Claim Rejections - 35 U.S.C. § 101 
3.	35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
4.	Claims 1, 3, 4, 6-8, 10, 11, 13-15, 17, 18, and 20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites a system, which is a machine, and thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). 
However, under Step 2A Prong One, the claim recites the limitations of “[(b)] perform an encoding operation on the training dataset to provide an encoded dataset having a lower dimension space than a dimension space of the training dataset,” “[(d)] determine an output of the one or more prediction models in the lower dimension space based on an input provided to the one or more prediction models,” and “[(e)] perform a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset.” These limitations of “[(b)] perform an encoding operation,” “[(d)] determine,” and “[(e)] perform a decoding operation” can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. 
More details or specifics are recited for the abstract idea of “[(b)] perform an encoding operation,” “[(b.1)] wherein, . . . the at least one processor is programmed or configured to: [(b.1.1)] perform the encoding operation on the training dataset based on a projection matrix, wherein . . . the at least one processor is programmed or configured to: [(b.1.1.1)] perform a factorization operation based on an optimization problem involving the projection matrix,” and accordingly, is merely more specific to the abstract idea. 
Also, further details or specifics are recited for the abstract idea of “[(d)] determine,” “[(d.1)] wherein the output of the one or more prediction models may include a predicted classification value for an event of a time series forecast of a plurality of events,” and accordingly, is merely more specific to the abstract idea. 
Still further, the claim recites more details or specifics to the abstract idea of “[(e)] perform a decoding operation,” “[(e.1)] wherein . . . the at least one processor is programmed or configured to: [(e.1.1)] project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix,” and accordingly, is merely more specific to the abstract idea. Thus, claim 1 recites an abstract idea.
Under Step 2A Prong Two, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include a “system” and “at least one processor,” which are recited at a high-level of generality, and thus are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. 
In this regard, the limitation of “[(c)] generate one or more prediction models based on the encoded dataset” is the use of the generic computer component (one or more prediction model) to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to implement the abstract idea into a practical application. The claim also recites more specifics or details to the additional element of “[(c)] generate one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein . . . the at least one processor is programmed or configured to: [(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, are merely more specific to the additional element. 
The claim also recites the additional element of “[(a)] receive a training dataset of a plurality of data instances,” which is a pre-solution, insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not serve to integrate the abstract idea into a practical application. The claim also recites more specifics or details to the additional element of “[(a)] receive,” where “[(a.1)] each data instance comprises a time series of data points,” and accordingly, is merely more specific to the additional element. Therefore, claim 1 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The claim includes a “system” and “at least one processor,” which are recited at a high-level of generality, and thus are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. In this regard, the limitation of “[(c)] generate one or more prediction models based on the encoded dataset” is the use of the generic computer component (one or more prediction model) to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites more specifics or details to the additional element of “[(c)] generate one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein . . . the at least one processor is programmed or configured to: [(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, are merely more specific to the additional element. 
The claim also recites the additional element of “[(a)] receive a training dataset of a plurality of data instances,” which is a well-understood, routine, and conventional activity of receiving or transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. The claim also recites more specifics or details to the additional element of “[(a)] receive,” where “[(a.1)] each data instance comprises a time series of data points,” and accordingly, is merely more specific to the additional element. Therefore, claim 1 is subject-matter ineligible. 
Claim 8 recites a method, which is a process, and thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). 
However, under Step 2A Prong One, the claim recites the limitations of “[(b)] performing . . . an encoding operation on the training dataset to provide an encoded dataset having a lower dimension space than a dimension space of the training dataset,” “[(d)] determining . . . an output of the one or more trained prediction models in the lower dimension space based on an input provided to the one or more trained prediction models,” and “[(e)] performing . . . a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset.” These limitations of “[(b)] performing . . . an encoding operation,” “[(d)] determining,” and “[(e)] performing . . . a decoding operation” can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. More details or specifics are recited for the abstract idea of “[(b)] performing . . . an encoding operation,” “[(b.1)] wherein, . . . comprises: [(b.1.1)] performing the encoding operation on the training dataset based on a projection matrix, wherein . . . comprising: [(b.1.1.1)] performing a factorization operation based on an optimization problem involving the projection matrix,” and accordingly, is merely more specific to the abstract idea. 
Also further, the claim recites more details or specifics to the abstract idea of “[(e)] performing a decoding operation,” “[(e.1)] wherein . . . the at least one processor is programmed or configured to: [(e.1.1)] projecting the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix,” and accordingly, is merely more specific to the abstract idea. Thus, claim 8 recites an abstract idea.
Under Step 2A Prong Two, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include a “at least one processor,” which is recited at a high-level of generality, and thus is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. 
In this regard, the limitation of “[(c)] generating . . . one or more prediction models based on the encoded dataset” is the use of the generic computer component (at least one processor, one or more prediction model) to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to implement the abstract idea into a practical application. The claim also recites more specifics or details to the additional element of “[(c)] generating . . . one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein generating . . . comprises: [(c.3.1)] training the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, are merely more specific to the additional element. 
The claim also recites the additional element of “[(a)] receiving a training dataset of a plurality of data instances,” which is a pre-solution, insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not serve to integrate the abstract idea into a practical application. The claim also recites more specifics or details to the additional element of “[(a)] receiving,” where “[(a.1)] each data instance comprises a time series of data points,” and accordingly, is merely more specific to the additional element. Therefore, claim 8 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The claim includes a “system” and “at least one processor,” which are recited at a high-level of generality, and thus are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. In this regard, the limitation of “[(c)] generate one or more prediction models based on the encoded dataset” is the use of the generic computer component (at least one processor, one or more prediction model) to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites more specifics or details to the additional element of “[(c)] generating one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein generating . . . comprises: [(c.3.1)] training the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, are merely more specific to the additional element. The claim also recites the additional element of “[(a)] receiving . . . a training dataset of a plurality of data instances,” which is a well-understood, routine, and conventional activity of receiving or transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. The claim also recites more specifics or details to the additional element of “[(a)] receiving,” where “[(a.1)] each data instance comprises a time series of data points,” and accordingly, is merely more specific to the additional element. Therefore, claim 8 is subject-matter ineligible.
Claim 15 recites a computer program product, which is an article of manufacture, and thus one of the statutory categories of patentable subject matter. (35 U.S.C. § 101). 
However, under Step 2A Prong One, the claim recites the limitations of “[(b)] perform an encoding operation on the training dataset to provide an encoded dataset having a lower dimension space than a dimension space of the training dataset,” “[(d)] determine an output of the one or more prediction models in the lower dimension space based on an input provided to the one or more prediction models,” and “[(e)] perform a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset.” These limitations of “[(b)] perform an encoding operation,” “[(d)] determine,” and “[(e)] perform a decoding operation” can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas. More details or specifics are recited for the abstract idea of “[(b)] perform an encoding operation,” “[(b.1)] wherein, . . . cause the at least one processor to: [(b.1.1)] perform the encoding operation on the training dataset based on a projection matrix, wherein . . . cause the at least one processor to: [(b.1.1.1)] perform a factorization operation based on an optimization problem involving the projection matrix,” and accordingly, is merely more specific to the abstract idea. 
Also, further details or specifics are recited for the abstract idea of “[(d)] determine,” “[(d.1)] wherein the output of the one or more trained prediction models may include a predicted classification value for an event of a time series forecast of a plurality of events,” and accordingly, is merely more specific to the abstract idea. 
Still further, the claim recites more details or specifics to the abstract idea of “[(e)] perform a decoding operation,” “[(e.1)] wherein . . . cause the at least one processor: [(e.1.1)] project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix,” and accordingly, is merely more specific to the abstract idea. Thus, claim 15 recites an abstract idea.
Under Step 2A Prong Two, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include “at least one non-transitory computer-readable medium including one or more instructions that, when executed by at least one processor,” which is recited at a high-level of generality, and thus is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. In this regard, the limitation of “[(c)] generate one or more prediction models based on the encoded dataset” is the use of the generic computer component (at least one non-transitory computer-readable medium, at least one processor, one or more prediction model) to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to implement the abstract idea into a practical application. The claim also recites more specifics or details to the additional element of “[(c)] generate one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein . . . the at least one processor is programmed or configured to: [(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, are merely more specific to the additional element. The claim also recites the additional element of “[(a)] receive a training dataset of a plurality of data instances,” which is a pre-solution, insignificant extra-solution activity of mere data gathering, (MPEP § 2106.05(g)), that does not serve to integrate the abstract idea into a practical application. The claim also recites more specifics or details to the additional element of “[(a)] receive,” where “[(a.1)] each data instance comprises a time series of data points,” and accordingly, is merely more specific to the additional element. Therefore, claim 15 is directed to the abstract idea.
Finally, under Step 2B, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. The claim includes a “at least one non-transitory computer-readable medium including one or more instructions that, when executed by at least one processor,” which are recited at a high-level of generality, and thus are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. In this regard, the limitation of “[(c)] generate one or more prediction models based on the encoded dataset” is the use of the generic computer component (at least one non-transitory computer-readable medium, at least one processor, one or more prediction model) to implement the abstract idea, (MPEP § 2106.05(f)), that does not amount to significantly more than the abstract idea. The claim also recites more specifics or details to the additional element of “[(c)] generate one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein . . . the at least one processor is programmed or configured to: [(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, is merely more specific to the additional element. The claim also recites the additional element of “[(a)] receive a training dataset of a plurality of data instances,” which is a well-understood, routine, and conventional activity of receiving or transmitting data over a network, (MPEP § 2106.05(d) sub II.i), that does not amount to significantly more than the abstract idea. The claim also recites more specifics or details to the additional element of “[(a)] receive,” where “[(a.1)] each data instance comprises a time series of data points,” and accordingly, is merely more specific to the additional element. Therefore, claim 15 is subject-matter ineligible.
Claim 3 depends from claim 1. Claim 10 depends from claim 8. Claim 17 depends from claim 15. The claims recite more details or specifics to the abstract idea of “[(b.1)] perform a factorization operation” “[(b.1.1)] wherein the optimization problem is the following: 

    PNG
    media_image1.png
    283
    662
    media_image1.png
    Greyscale

wherein Yit is the training dataset, i is an i-th observation target, t is a time stamp of each data point of the time series of data points, F is a projection matrix,             
                
                    
                        f
                    
                    
                        i
                    
                    
                        T
                    
                
            
         is a translation to an i-th row of the projection matrix F, X is the encoded dataset having a lower dimension space, xt is a data point of the time series of data points, k is a first dimension of the encoded dataset X, ℝf(F) is a squared Frobenius norm of the projection matrix F, λf is a weight of the projection matrix F, W is an autoregression model, ℝw(W) is a squared Frobenius norm of the auto-regression model W, λw is a weight of the auto-regression model W, τAR is a score of the auto-regression model W, λX is a weight of the score of the auto-regression model τAR, ℒ is a second dimension of the encoded dataset X, w is a prediction model, η is a weight to a vector norm, l is an individual component of the second dimension ℒ, T is a total time of the of the time series of data points, and ϵt is an error value associated with xt,” and accordingly, is merely more specific to the abstract idea. The abstract idea of these claims are not integrated into a practical application, (see MPEP § 2106.05(d)), nor do they amount to significantly more than the abstract idea, (MPEP § 2106.05(d)), because the claims recite no more than the abstract idea. Therefore, claims 3, 10, and 17 are subject-matter ineligible.
Claim 4 depends directly or indirectly from claim 1. Claim 11 depends directly or indirectly from claim 8. Claim 18 depends directly or indirectly from claim 15. The claims recite more details or specifics to the abstract idea of “[(b.1)] performing the factorization operation,” to “[(b.1.1)] update the projection matrix F using a least square optimization problem,” “[(b.1.2)] update the transferred low-dimensional space X using a graph-regularized alternating least squares (GRALS),” and “[(b.1.3)] update the auto-regression model W by solving the following:

    PNG
    media_image2.png
    70
    686
    media_image2.png
    Greyscale
, and accordingly, are merely more specific to the abstract idea. The abstract idea of these claims are not integrated into a practical application, (see MPEP § 2106.05(d)), nor do they amount to significantly more than the abstract idea, (MPEP § 2106.05(d)), because the claims recite no more than the abstract idea. Therefore, claims 4, 11, and 18 are subject-matter ineligible.
Claim 6 depends from claim 1. Claim 13 depends from claim 8. Claim 20 depends from claim 15. The claims recite more details or specifics to the abstract idea of “[(b)] perform an encoding operation,” wherein “[(b.1)] the lower dimension space has a first dimension and a second dimension,” “[(b.2)] the dimension space of the training dataset has a first dimension and a second dimension,” “[(b.3)] the first dimension of the lower dimension space is less than the first dimension of the training dataset,” and “[(b.4)] the second dimension of the lower dimension space is equal to the second dimension of the training dataset,” and accordingly, are merely more specific to the abstract idea. The abstract idea of these claims are not integrated into a practical application, (see MPEP § 2106.05(d)), nor do they amount to significantly more than the abstract idea, (MPEP § 2106.05(d)), because the claims recite no more than the abstract idea. Therefore, claims 6, 13, and 20 are subject-matter ineligible.
Claim 7 depends directly or indirectly from claim 1. Claim 14 depends directly or indirectly from claim 8. The claims recite more details or specifics of the additional element of “[(c)] generate one or more prediction models,” “[(c.2)] wherein the one or more prediction models comprise a number of prediction models,” and “[(c.2)] wherein the number of prediction models is equal to the first dimension of the lower dimension space,” and accordingly, are merely more specific to the additional element. Therefore, claims 7 and 14 are subject-matter ineligible.
Claim Rejections – 35 U.S.C. § 103
5.	The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
6.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. 	Determining the scope and contents of the prior art.
2. 	Ascertaining the differences between the prior art and the claims at issue.
3. 	Resolving the level of ordinary skill in the pertinent art.
4. 	Considering objective evidence present in the application indicating obviousness or nonobviousness.
7.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.
8.	Claims 1, 3, 4, 6-8, 10, 11, 13-15, 17, 18, and 20 are rejected under 35 U.S.C. § 103 as being unpatentable over US Patent 10970629 to Dirac et al. [hereinafter Dirac] in view of Shi et al., “Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting,” arXiv (2020) [hereinafter Shi] and Yu et al., “Temporal Regularized Matrix Factorization for High-Dimensional Time Series Prediction,” NIPS (2016) [hereinafter Yu].
Regarding claims 1, 8, and 15, Dirac teaches [a] system (Dirac 1:52-54 & Fig. 7, teaches “a computing system configured to implement training and processing of artificial neural networks) of claim 1, [a] method (Dirac 2:64-66 teaches “methods of using a machine learning model trained to generate encoded representations of sparse, high-dimensional output”) of claim 8, and [a] computer program product . . . comprising at least one non-transitory computer-readable medium including one or more instructions that, when executed by at least one processor (Dirac 22:2-17 teaches a “computing system 700 that may be used in some embodiments to execute the processes and implement the features described above. In some embodiments, the computing system 700 may include: one or more computer processors 702, such as physical central processing units (‘CPUs’); . . . one or more computer readable medium drives 706, such as high density disks (‘HDDs’), solid state drives (‘SDDs’), flash drives, and/or other persistent non-transitory computer-readable media; . . . one or more computer readable memories 710, such as random access memory (‘RAM’) and/or other volatile non-transitory computer-readable media; and one or more graphical processors 714, such as graphics processing units (‘GPUs’)”) of claim 8, for generating a machine learning model based on encoded time series data using model reduction comprising:
at least one processor (Dirac 22:5-7 teaches “the computing system 700 may include: one or more computer processors 702, such as physical central processing units (‘CPUs’)”) programmed or configured to:
[(a)] receive a training dataset of a plurality of data instances (Dirac, Fig. 2, teaches training a machine learning model with encoded training data [Examiner annotations in dashed-line text boxes]:”

    PNG
    media_image3.png
    834
    840
    media_image3.png
    Greyscale

Dirac 9:13-15 teaches that “training data input vectors 210 . . . include N separate data elements or ‘dimensions’ (where N is some positive integer) [(that is, receive a training dataset of a plurality of data instances)]”), . . . ;
[(b)] perform an encoding operation on the training dataset to provide an encoded dataset (Dirac 9:25-28 teaches that “the training data input vectors 210 and the reference data output vectors 218 may be encoded using a probabilistic data structure with a plurality of k mapping functions [(that is, perform an encoding operation on the training dataset to provide an encoded dataset)]”) having a lower dimension space than a dimension space of the training dataset (Dirac 9:13-24 teaches “training data input vectors 210 . . . include N separate data elements or ‘dimensions’ (where N is some positive integer). . . . The encoded training data input vectors 214 and the encoded reference data output vectors 220 may each include M separate data elements or ‘dimensions’ (where M is some positive integer smaller than 1V) [(that is, having a lower dimension space than a dimension space of the training dataset)]”), . . . ;
[(c)] generate one or more prediction models based on the encoded dataset (Dirac 9:16-22 teaches that “[i]nstead of training the machine learning model 202 in the N-dimensional space of the training data input vectors 210 and the reference data output vectors 218, the machine learning model 202 may be trained using encoded training data input vectors 214 and encoded reference data output vectors 220 [(that is, generate one or more prediction models based on the encoded dataset)]”), 
[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space (Dirac 10:9-11 teaches “the machine learning model 202 may generate an encoded training data output vector 216 from the encoded training data input vector 214 [(that is, the one or more prediction models are configured to provide an output in the lower dimension space)]”),
[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event (Dirac 1:61-63 teaches that “[e]xamples of machine learning models that may be used with aspects of this disclosure include classifiers [(that is, the one or more prediction models are configured to provide a predicted classification value for an event)]”), and 
[(c.3)] wherein, when generating the one or more prediction models based on the encoded dataset, the at least one processor is programmed or configured to: [(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models (Dirac 8:54-55 and Fig. 2 (above) teaches “[a] regression model or a support vector machine [202] may learn the separations of encoded training data input vectors 214 [(that is, train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models)]”);
[(d)] determine an output of the one or more prediction models in the lower dimension space based on an input provided to the one or more prediction models (Dirac, Abstract, teaches “[t]he compact machine learning model can output an encoded representation of a higher-dimensional space [(that is, determine an output of the one or more prediction models in the lower dimension space based on an input provided to the one or more prediction models)]”),
[(d.1)] wherein the output of the one or more prediction models may include a predicted classification value for an event of a time series forecast of a plurality of events; and
[(e)] perform a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset (Dirac 3:18-20 teaches “the encoded output vector may be decoded into a higher-dimensional output vector [(that is, from the lower dimension space to the dimension space of the training dataset)] using the mapping functions [(that is, perform a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset)]”), . . .
Though Dirac teaches the encryption of training data into a Bloom-filter-encoded input vector represented as a matrix multiplication, and decoding a Bloom-filter-encoded output vector to generate an output vector, Dirac, however, does not explicitly teach –
* * *
[(a) receive] . . . , wherein each data instance comprises a time series of data points;
[(b)] perform an encoding operation on the training dataset] . . . [(b.1)] wherein, when performing the encoding operation on the training dataset to provide the encoded dataset, the at least one processor is programmed or configured to: [(b.1.1)] perform the encoding operation on the training dataset based on a projection matrix, . . . ;
* * *
[(e) perform a decoding operation] . . . , [(e.1)] wherein, when performing the decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset, the at least one processor is programmed or configured to: [(e.1.1)] project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix.
But Shi teaches –
* * *
[(a) receive] . . . , wherein each data instance comprises a time series of data points (Shi, right column of p. 5, “Algorithm 1, TSF using BHT-ARIMA,” first line, teaches “Input: A time series data X ϵ ℝI1 x . . . x IN x T, (p; d; q), τ, maximum iteration K, and stop criteria tol. [that is, [receive] . . . , wherein each data instance comprises a time series of data points)]”);
[(b)] perform an encoding operation on the training dataset] . . . [(b.1)] wherein, when performing the encoding operation on the training dataset to provide the encoded dataset, the at least one processor is programmed or configured to: [(b.1.1)] perform the encoding operation on the training dataset based on a projection matrix (Shi, left column of p. 2, “Introduction,” first full paragraph, teaches to “employ low-rank Tucker decomposition to learn compressed core tensors by orthogonal factor (projection) matrices [(that is, “compressed” is performing the encoding operation)]. These projection matrices are jointly used to maximally preserve the temporal continuity between core tensors which can better capture the intrinsic temporal correlations than the original [time series (TS)] data [(that is, perform the encoding operation on the training dataset based on a projection matrix)]“; see also Shi, Fig. 1, below) . . . ;
* * *
[(e) perform a decoding operation] . . . , [(e.1)] wherein, when performing the decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset, the at least one processor is programmed or configured to: [(e.1.1)] project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix (Shi, Fig. 1, teaches a block Hankel tensor-Autoregressive Integrated Moving Average (BHT-ARIMA) model in which multi-dimensional data is reduced to use of core tensors with projection matrices UT and then are decoded using inverses of the projection matrices U [Examiner annotations in dashed-line text boxes]:”

    PNG
    media_image4.png
    754
    571
    media_image4.png
    Greyscale

[(that is, project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix]”).

Dirac and Shi are from the same or similar field of endeavor. Dirac teaches reducing model size of a machine learning model with encoding. Shi teaches a novel approach for multiple time series forecasting. Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s claimed invention to modify Dirac pertaining to model size reduction with the time series training data of Shi.
The motivation to do so is to “effectively improves forecasting accuracy and reduces computational cost compared with the state-of-the-art methods.” (Shi, Abstract).
Though Dirac and Shi teach the feature of encoding and decoding training data based on a projection matrix, the combination of Dirac and Shi, however, does not explicitly teach –
* * *
[(b)] perform an encoding operation on the training dataset] . . . [(b.1.1)] . . . wherein, when performing the encoding operation on the training dataset based on the projection matrix, the at least one processor is programmed or configured to: [(b.1.1.1)] perform a factorization operation based on an optimization problem involving the projection matrix;
* * *
But Yu teaches -
* * *
[(b)] perform an encoding operation on the training dataset] . . . [(b.1.1)] . . . wherein, when performing the encoding operation on the training dataset based on the projection matrix, the at least one processor is programmed or configured to: [(b.1.1.1)] perform a factorization operation based on an optimization problem involving the projection matrix (Yu, Fig.1, teaches performing a factorization operation [Examiner annotation in dashed-line text boxes]:

    PNG
    media_image5.png
    281
    682
    media_image5.png
    Greyscale

Yu at p. 6, “4. A Novel Autoregressive Temporal Regularizer,” fifth paragraph, teaches one may “[n]ote that since our method [(TRMF)] is highly modular, one can resort to any method to solve the optimization subproblems [(that is, optimization problem)] that arise for each module [(that is, perform a factorization operation based on an optimization problem involving the projection matrix)]”;
[Examiner notes that though Yu does not explicitly recite a projection matrix, Yu, however, does teach “[a] natural way to model high-dimensional time series data [(that is, a vector)] is [mapped] in the form of a matrix, with rows corresponding to each one-dimensional time series and columns corresponding to time points (Yu at p. 2, “Introduction,” first full paragraph),” which is the inherent use of a projection matrix);
* * *
Dirac, Shi and Yu are from the same or similar field of endeavor. Dirac teaches reducing model size of a machine learning model with encoding. Shi teaches a novel approach for multiple time series forecasting. Yu teaches a temporal regularized matrix factorization (TRMF) framework that supports matrix factorization based on an optimization problem.
Thus, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the Applicant’s claimed invention to modify the combination of Dirac and Shi pertaining to model size reduction relating to time series training data with the optimization problem with the time series training data of Yu.
The motivation to do so is because the “proposed [temporal regularized matrix factorization (TRMF) framework] is highly general, and subsumes many existing approaches for time series analysis. We make interesting connections to graph regularization methods in the context of learning the dependencies in an autoregressive framework. Experimental results show the superiority of TRMF in terms of scalability and prediction quality.” (Yu, Abstract).
Regarding claims 3, 10, and 17, the combination of Dirac, Shi, and Yu teaches all of the limitations of claims 3, 8, and 15, respectively, as described above in detail.
Yu teaches –
[(b.1.1)] wherein the optimization problem is the following: 

    PNG
    media_image6.png
    121
    1087
    media_image6.png
    Greyscale
(Yu at p. 6, “4. A Novel Autoregressive Temporal Regularizer,” second paragraph, teaches “[p]lugging                         
                            
                                
                                    τ
                                
                                
                                    M
                                
                            
                            
                                
                                    X
                                    |
                                    θ
                                
                            
                            =
                            
                                
                                    τ
                                
                                
                                    A
                                    R
                                
                            
                            
                                
                                    X
                                
                                
                                    L
                                    ,
                                    W
                                    ,
                                    η
                                
                            
                        
                     into [equation] (7), we obtain the following [optimization] problem:

    PNG
    media_image7.png
    88
    1113
    media_image7.png
    Greyscale

where Rw(W) is a regularizer for W. We will refer to [equation] (12) as [Temporal Regularized Matrix Factorization-Autoregressive (TRMF-AR)]”) wherein:

    PNG
    media_image8.png
    161
    868
    media_image8.png
    Greyscale

(Yu at p. 5, “4. A Novel Autoregressive Temporal Regularizer,” first full paragraph, teaches “[temporal regularized matrix factorization (TRMF)] allows us to learn the weights {W(l)} when they are unknown. . . . Let                         
                            
                                
                                    
                                        
                                            x
                                        
                                        -
                                    
                                
                                
                                    r
                                
                                
                                    ⊤
                                
                            
                            =
                            
                                
                                    •
                                     
                                    •
                                     
                                    •
                                     
                                    ,
                                     
                                    
                                        
                                            x
                                        
                                        
                                            r
                                            t
                                        
                                    
                                    ,
                                     
                                    •
                                     
                                    •
                                     
                                    •
                                
                            
                        
                     be the r-th row of X and                         
                            
                                
                                    
                                        
                                            w
                                        
                                        -
                                    
                                
                                
                                    r
                                
                                
                                    ⊤
                                
                            
                            =
                            
                                
                                    •
                                     
                                    •
                                     
                                    •
                                     
                                    ,
                                     
                                    
                                        
                                            W
                                        
                                        
                                            r
                                            l
                                        
                                    
                                    ,
                                     
                                    •
                                     
                                    •
                                     
                                    •
                                
                            
                        
                     be the e-th row of W. Then [equation] (9) can be written as                         
                            
                                
                                    τ
                                
                                
                                    A
                                    R
                                
                            
                            
                                
                                    X
                                    |
                                    L
                                    ,
                                     
                                    W
                                    ,
                                    η
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        r
                                        =
                                        1
                                    
                                    
                                        k
                                    
                                
                                
                                    
                                        
                                            τ
                                        
                                        
                                            A
                                            R
                                        
                                    
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        -
                                                    
                                                
                                                
                                                    r
                                                
                                            
                                            |
                                            L
                                            ,
                                             
                                            
                                                
                                                    
                                                        
                                                            w
                                                        
                                                        -
                                                    
                                                
                                                
                                                    r
                                                
                                            
                                            ,
                                            η
                                        
                                    
                                
                            
                        
                    , where we define 

    PNG
    media_image9.png
    103
    897
    media_image9.png
    Greyscale

with xt being the t-th element of                         
                            
                                
                                    x
                                
                                -
                            
                        
                    , and wl, being the l-th element of                         
                            
                                
                                    w
                                
                                -
                            
                        
                    ”), and
wherein:

    PNG
    media_image10.png
    95
    535
    media_image10.png
    Greyscale

(Yu at p. 5, “4. A Novel Autoregressive Temporal Regularizer,” first full paragraph, teaches “[a]ssume that                         
                            
                                
                                    x
                                
                                
                                    t
                                
                            
                        
                     is a noisy linear combination of some previous points that is,                         
                            
                                
                                    x
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        l
                                        ∈
                                        L
                                    
                                
                                
                                    
                                        
                                            W
                                        
                                        
                                            
                                                
                                                    l
                                                
                                            
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    t
                                    -
                                    l
                                
                            
                            +
                            
                                
                                    ϵ
                                
                                
                                    t
                                
                            
                        
                    , where ϵt is a Gaussian noise vector”) wherein Yit is the training dataset, i is an i-th observation target, t is a time stamp of each data point of the time series of data points, F is a projection matrix,                         
                            
                                
                                    f
                                
                                
                                    i
                                
                                
                                    T
                                
                            
                        
                     is a translation to an i-th row of the projection matrix F, X is the encoded dataset having a lower dimension space, xt is a data point of the time series of data points, k is a first dimension of the encoded dataset X, ℝf(F) is a squared Frobenius norm of the projection matrix F, λf is a weight of the projection matrix F, W is an autoregression model, ℝw(W) is a squared Frobenius norm of the auto-regression model W, λw is a weight of the auto-regression model W, τAR is a score of the auto-regression model W, λX is a weight of the score of the auto-regression model τAR, ℒ is a second dimension of the encoded dataset X, w is a prediction model, η is a weight to a vector norm, l is an individual component of the second dimension ℒ, T is a total time of the of the time series of data points, and ϵt is an error value associated with xt.
Regarding claims 4, 11, and 18, the combination of Dirac, Shi, and Yu teaches all of the limitations of claims 3, 10, and 17, respectively, as described above in detail.
Yu teaches –
wherein, when [(b.1.1)] performing the factorization operation, the at least one processor is programmed or configured to:
[(b.1.2)] update the projection matrix F using a least square optimization problem (Yu at p. 6, “4. A Novel Autoregressive Temporal Regularizer-Updates for F,” first paragraph, teaches “[w]hen X and W are fixed, the subproblem of updating F is the same as updating F while X fixed in (1). Thus, fast algorithms such as alternating least squares or coordinate descent can be applied directly to find F [(that is, update the projection matrix F using a least square optimization problem)]”);
[(b.1.3)] update the transferred low-dimensional space X using a graph-regularized alternating least squares (GRALS) (Yu at p. 6, “4. A Novel Autoregressive Temporal Regularizer- Updates for X,” first paragraph, teaches “[w]e solve 

    PNG
    media_image11.png
    42
    626
    media_image11.png
    Greyscale
. 
. . . [W]e can apply GRALS [15] to find X [(that is, update the transferred low-dimensional space X using a graph-regularized alternating least squares (GRALS)]”); and
[(b.1.4)] update the auto-regression model W by solving the following:

    PNG
    media_image12.png
    97
    896
    media_image12.png
    Greyscale

(Yu at p. 6, “4. Novel Autoregressive Temporal Regularizer-Updates for W,” first paragraph, teaches “[h]ow to update W while F and X fixed depends on the choice of Rw(W). There are many parameter estimation techniques developed for AR with various regularizers [11, 20]. For simplicity, we consider the squared Frobenius norm: Rw(W) =                         
                            
                                
                                    
                                        
                                            W
                                        
                                    
                                
                                
                                    F
                                
                                
                                    2
                                
                            
                        
                     . As a result, each row of                         
                            
                                
                                    
                                        
                                            w
                                        
                                        -
                                    
                                
                                
                                    r
                                
                            
                        
                     of W can be updated by solving the following one-dimensional autoregressive problem.

    PNG
    media_image13.png
    83
    877
    media_image13.png
    Greyscale

which is a simple |ℒ| dimensional ridge regression problem with T − m + 1 instances [(that is, update the auto-regression model W by solving the following)]”).
Regarding claims 6, 13, and 20, the combination of Dirac, Shi, and Yu teaches all of the limitations of claims 1, 8, and 15, as described above in detail. 
Dirac teaches -
wherein the lower dimension space has a first dimension and a second dimension (Dirac 13:4-6 teaches “the Bloom-filter-encoded output vector 316 may be mapped from its 20-dimensional value space [(that is, the “20-dimensional value space” is the lower dimension space has a first dimension and a second dimension)] to the output vector 312 in a 50-dimensional value space”) , and 
wherein the dimension space of the training dataset has a first dimension and a second dimension (Dirac 13:4-6 teaches “the Bloom-filter-encoded output vector 316 may be mapped from its 20-dimensional value space to the output vector 312 in a 50-dimensional value space [(that is, as set out in the claim, the “dimension space of the training dataset” corresponds to “the output,” where a “50-dimensional value space” is the dimension space of the training dataset has a first dimension and a second dimension)]”), 
wherein the first dimension of the lower dimension space is less than the first dimension of the training dataset (Dirac, claim 1, teaches “the encoded training data input vector comprises a representation of the training data input vector having a third quantity of values [(that is, the first dimension of the lower dimension space)] that is less than the first quantity of values [(that is, “the encoded training data input vector” is less than the first dimension of the training dataset)]”), and 
wherein the second dimension of the lower dimension space is equal to the second dimension of the training dataset (Dirac 9:34-42 teaches “an encoded training data input vector 214 may include M non-negative integer elements. . . . [I]f M equals 1,000 and a count-min sketch includes seven hash functions, then at most seven elements of the 1,000 elements of the encoded input vector each has an integer value greater than or equal to one [(that is, the second dimension of the lower dimension space is equal to the second dimension of the training dataset)] and the remaining elements of the encoded input vector each has a value of zero”).
Regarding claims 7 and 14, the combination of Dirac, Shi and Yu teaches all of the limitations of claims 6 and 13, respectively, as described above in detail.
Dirac teaches -
 wherein the one or more prediction models comprise a number of prediction models (Dirac 2:34-37 teaches “generating and training a machine learning model with encoded training data and without reducing (or substantially reducing) the accuracy of the machine learning model [(that is, a “machine learning model” is a number of prediction models)]”), and 
wherein the number of prediction models is equal to the first dimension of the lower dimension space (Dirac 3:3-4 teaches “the high-dimensional input vector may correspond to a vector of binary values [(that is, a “binary vector” has at least a first dimension of the lower dimension space, where the number of prediction models is equal to the first dimension of the lower dimension space)]”
[Examiner notes that the broadest reasonable interpretation of “one or more prediction models comprises a number of prediction models” covers a “singular” prediction model, and in kind, equals a number of a first dimension of the lower dimension space is similarly “singular,” such that the number of prediction models is equal to the first dimension of the lower dimension space, which is not inconsistent with the Applicant’s disclosure. (MPEP § 2111)]).
Response to Arguments
9.	Examiner has fully considered Applicant’s arguments, and responds below accordingly.
35 U.S.C. § 101
10.	Applicant submits that the “limitations of claim 1 demonstrate that claim 1 is directed to an unconventional system that improves generation and execution of machine learning models by generating a prediction model through training a machine learning model using encoded training datasets at a reduced dimension in an encoded space and generating a prediction in the encoded space that may be decoded into a final prediction.
As amended, claim 1 recites [a] system . . . 
at least one processor programmed or configured to: 
[(a)] receive a training dataset of a plurality of data instances, 
[(a.1)] wherein each data instance comprises a time series of data points; 
[(b)] perform an encoding operation on the training dataset to provide an encoded dataset having a lower dimension space than a dimension space of the training dataset, 
[(b.1)] wherein, when performing the encoding operation on the training dataset to provide the encoded dataset, the at least one processor is programmed or configured to: 
[(b.1.1)] perform the encoding operation on the training dataset based on a projection matrix, wherein, when performing the encoding operation on the training dataset based on the projection matrix, the at least one processor is programmed or configured to: 
[(b.1.1.1)] perform a factorization operation based on an optimization problem involving the projection matrix; 
[€] generate one or more prediction models based on the encoded dataset, 
[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space, 
[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event, and 
[(c.3)] wherein, when generating the one or more prediction models based on the encoded dataset, the at least one processor is programmed or configured to: 
[(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models; 
[(d)] determine an output of the one or more trained prediction models in the lower dimension space based on an input provided to the one or more trained prediction models, 
[(d.1)] wherein the output of the one or more prediction models may include a predicted classification value for an event of a time series forecast of a plurality of events; and 
[€] perform a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset, 
[(e.1)] wherein, when performing the decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset, the at least one processor is programmed or configured to: 
[(e.1.1)] project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix.
[(Response at pp. 2-3] (Applicant’s amendments shown in underline)).
As seen above, the system of claim 1 provides a practical application of a solution to the technical problem of reducing an amount of time and computational resources involved in generating machine learning models and executing tasks with machine learning models. Furthermore, the limitations of claim 1 show that claim 1 is not able to be ‘practically . . . performed in the mind’ based on the use of complex iterations of the embedding learning process as well using a set of updated embeddings to generate an embedding layer of a neural network of the machine learning model.” (Response at p. 14).
Examiner Response:
Examiner respectfully disagrees because the rejections identify the abstract idea (i.e., judicial exception) by referring to what is recited (i.e., set forth or described) in the claim and explain why it is considered an abstract idea. (MPEP § 2106.07(a)).
For example, referring to claim 1, the limitations of “[(b)] perform an encoding operation,” “[(d)] determine,” and “[(e)] perform a decoding operation” can practically be performed in the human mind, including, for example, observations, evaluations, judgments, and opinions, and accordingly, are mental process, (MPEP § 2106.04(a)(2) sub III), which is one of the groupings of abstract ideas.
Accordingly, the claims set out that they are directed to an abstract idea, as set out above in detail.
11.	Under Step 2A Prong Two, “Applicant submits that the claims recite ‘an invention that is not merely the routine or conventional use’ of computers or the Internet. DDR Holdings, 773 F.3d at 1259. Taking the limitations of claim 1 individually or in combination, they provide meaningful limitations with regard to an alleged judicial exception of a mental process. See MPEP § 2106.05(e). . . . Accordingly, amended independent claim 1 is directed to statutory subject matter.” (Response at pp. 14-15).
Examiner Response:
Examiner respectfully disagrees because the rejections hereinabove identify any additional elements (specifically point to claim features/limitations/steps) recited in the claim beyond the identified abstract idea; and evaluate the integration of the abstract idea into a practical application by explaining that the claim as a whole, looking at the additional elements individually and in combination, does not integrate the abstract idea into a practical application using the considerations set forth in MPEP §§ 2106.04(d), 2106.05(a)-(c) and (e)-(h). (MPEP § 2106.07(a)). 
Referring to claim 1 for example, the claim as a whole is not integrated into a practical application, because the additional elements recited in the claim beyond the identified judicial exception include a “system” and “at least one processor,” which are recited at a high-level of generality, and thus are generic computer components used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. The claim also recites the element of “one or more prediction models,” which is recited at a high-level of generality, and thus, is a generic computer component used to implement the abstract idea, (MPEP § 2106.05(f)), that does not serve to integrate the abstract idea into a practical application. 
Also, the claim also recites more specifics or details to the additional element of “[(c)] generate one or more prediction models,” in that “[(c.1)] wherein the one or more prediction models are configured to provide an output in the lower dimension space,” “[(c.2)] wherein the one or more prediction models are configured to provide a predicted classification value for an event,” “[(c.3)] wherein . . . the at least one processor is programmed or configured to: [(c.3.1)] train the one or more prediction models in the lower dimension space based on the encoded dataset to provide one or more trained prediction models,” and accordingly, are merely more specific to the additional element.
Accordingly, the additional elements do not serve to integrate the abstract idea into a practical application, and accordingly, are subject-matter ineligible.
35 U.S.C. § 103
12.	Applicant submits that “[n]one of the cited references teaches or suggests all of the limitations of claim 1. For example, neither Dirac nor Shi teaches or suggests at least one processor programmed or configured to: 
* * *
[(b)] perform an encoding operation on the training dataset to provide an encoded dataset having a lower dimension space than a dimension space of the training dataset, 
[(b.1)] wherein, when performing the encoding operation on the training dataset to provide the encoded dataset, the at least one processor is programmed or configured to: 
[(b.1.1)] perform the encoding operation on the training dataset based on a projection matrix, wherein, when performing the encoding operation on the training dataset based on the projection matrix, the at least one processor is programmed or configured to: 
[(b.1.1.1)] perform a factorization operation based on an optimization problem involving the projection matrix; 
* * *
[(d)] determine an output of the one or more trained prediction models in the lower dimension space based on an input provided to the one or more trained prediction models, 
[(d.1)] wherein the output of the one or more prediction models may include a predicted classification value for an event of a time series forecast of a plurality of events; and
[(e)] perform a decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset, 
[(e.1)] wherein, when performing the decoding operation on the output to project the output from the lower dimension space to the dimension space of the training dataset, the at least one processor is programmed or configured to: 
[(e.1.1)] project the output from the lower dimension space to the dimension space of the training dataset using an inverse matrix corresponding to the projection matrix, wherein the inverse matrix is an inverse of the projection matrix. 
(Response at pp. 16-27).
Examiner’s Response:
Examiner agrees that neither Dirac nor Shi teaches or suggests the limitations of the claims relating to the “perform a factorization operation based on an optimization problem involving the projection matrix.” 
Examiner relies upon the teachings of Yu regarding these features, as set out above in detail.
Conclusion
13.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
14.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
(US Published Application 20210365820 to Shabat et al.) teaches selection of anchor points using randomized matrix factorizations (e.g., random interpolative decomposition). Randomization may be used by projecting the kernel onto custom-character (e.g., >k) random vectors. This projection may be executed by applying the kernel K to a random matrix Ω, e.g., using fast matrix multiplication such as the FIG transform, to generate randomized kernel projection Y. This projection reduces the dimension of the matrix to a reduced dimensional space. 
(US Published Application 20110040711 to Perronnin et al.) teaches A linear classifier is trained in the second multi-dimension space. The linear classifier can approximate the accuracy of a non-linear classifier in the original space when predicting labels for new samples, but with lower computation cost in the learning phase.
15.	Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KEVIN L. SMITH whose telephone number is (571) 272-5964. Normally, the Examiner is available on Monday-Thursday 0730-1730. 
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the Examiner by telephone are unsuccessful, the Examiner’s supervisor, KAKALI CHAKI can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/K.L.S./
Examiner, Art Unit 2122

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122
Read full office action
Prosecution Timeline

Feb 22, 2022
Application Filed
Aug 06, 2025
Non-Final Rejection — §101, §103
Sep 25, 2025
Interview Requested
Oct 16, 2025
Applicant Interview (Telephonic)
Oct 16, 2025
Examiner Interview Summary
Nov 11, 2025
Response Filed
Mar 05, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/321,251
Patent 12591815
METHOD AND SYSTEM FOR UPDATING MACHINE LEARNING BASED CLASSIFIERS FOR RECONFIGURABLE SENSORS
2y 5m to grant Granted Mar 31, 2026
17/704,721
Patent 12585917
REINFORCEMENT LEARNING USING ADVANTAGE ESTIMATES
2y 5m to grant Granted Mar 24, 2026
16/994,396
Patent 12547759
PRIVACY PRESERVING MACHINE LEARNING MODEL TRAINING
2y 5m to grant Granted Feb 10, 2026
18/514,482
Patent 12530613
SYSTEMS AND METHODS FOR PERFORMING QUANTUM EVOLUTION IN QUANTUM COMPUTATION
2y 5m to grant Granted Jan 20, 2026
18/137,812
Patent 12518214
DISTRIBUTED MACHINE LEARNING SYSTEMS INCLUDING GENERATION OF SYNTHETIC DATA
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
37%
Grant Probability
55%
With Interview (+18.0%)
4y 8m
Median Time to Grant
Moderate
PTA Risk
Based on 134 resolved cases by this examiner. Grant probability derived from career allow rate.