Office Action Analysis: 17815316 — CONGENIALITY-PRESERVING GENERATIVE ADVERSARIAL NETWORKS FOR IMPUTING LOW-DIMENSIONAL MULTIVARIATE TIME-SERIES DATA

Examiner Intelligence

PHAM, JESSICA THUY View full profile →
Grants only 33% of cases
Career Allow Rate
1 granted / 3 resolved
-21.7% vs TC avg
Minimal -33% lift
Without
With
+-33.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
38 currently pending
Career history
41
Total Applications
across all art units
Statute-Specific Performance

§101
26.8%
-13.2% vs TC avg
§103
35.5%
-4.5% vs TC avg
§102
11.0%
-29.0% vs TC avg
§112
22.7%
-17.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 3 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment/Status of Claims
Claims 1, 7, and 13 were amended.
Claims 4, 10, and 16 were cancelled.
Claims 1-3, 5-9, 11-15, and 17-18 are pending and examined herein.
Claims 1-3, 5-9, 11-15, and 17-18 are rejected under 35 U.S.C. 112(b).
Claims 1-3, 5-9, 11-15, and 17-18 are rejected under 35 U.S.C. 101.
Claims 1-3, 5-9, 11-15, and 17-18 are rejected under 35 U.S.C. 103.

Response to Arguments
Applicant's arguments filed 10/01/2025 regarding the 35 U.S.C. 112(b) rejection of claims 1-18 for the relative term “high-dimensional” have been fully considered but they are not persuasive. Applicant argues that the term “high-dimensional” is well-known and within the scope of the claimed subject matter. Examiner respectfully disagrees. One of ordinary skill in the art would not be able to ascertain what the metes and bounds of the word “high-dimensional”. In other words, one of ordinary skill would not be able to determine a specific number of dimensions at which the term “high-dimensional” would apply. Examiner notes that there appears to be a link in the remarks that was not able to be accessed, as the link did not transfer to the application file.

Applicant’s arguments, see pages 12-13, filed 10/01/2025, with respect to the 112(b) rejection of claim 4 for the term “minimizing a difference” have been fully considered and are persuasive. Applicant’s arguments, see pages 12-13, filed 10/01/2025, with respect to the 112(b) rejection of claims 7-12 for lack of antecedent basis have been fully considered and are persuasive. However, as explained above, the 35 U.S.C. 112(b) rejection of claims 1-18 for the relative term “high-dimensional” is maintained.

Applicant's arguments filed 10/01/2025 regarding the 35 U.S.C. 101 rejection of claims 1-18 have been fully considered but they are not persuasive. Applicant argues that the judicial exception is implemented with a particular machine integral to the claims and thus integrates the judicial exception into a practical application, particularly citing "a digital twin, a simulation of an industry machine or an industrial plant, or a sensor, a production unit, or a manufacturing unit.” See 35 U.S.C. 112(b) rejection below for interpretation. Examiner respectfully disagrees with the argument. MPEP 2106.03 states "A machine is a "concrete thing, consisting of parts, or of certain devices and combination of devices."" Thus, a digital twin and a simulation of an industry machine or an industrial plant are not machines. MPEP 2106.05(b) provides relevant factors for determining whether a machine recited in a claim provides significantly more. 
Firstly, MPEP 2106.05(b)(I) states "The particularity or generality of the elements of the machine or apparatus, i.e., the degree to which the machine in the claim can be specifically identified (not any and all machines)." The machines listed “a sensor, a production unit, or a manufacturing unit” are very generic machines, and thus are not particular machines. Secondly, MPEP 2106.05(b)(II) states "Integral use of a machine to achieve performance of a method may integrate the recited judicial exception into a practical application or provide significantly more, in contrast to where the machine is merely an object on which the method operates, which does not integrate the exception into a practical application or provide significantly more." The full limitation in the present application states "imputing, with the cpGAN, a low-dimensional multivariate industrial time-series data in a digital twin, a simulation of an industry machine or an industrial plant, or a sensor, a production unit, or a manufacturing unit". The cpGAN is the method, which is simply operated on one of the listed machines. Thus, the machines are not a particular machine. Thirdly, MPEP 2106.05(b)(III) "Whether its involvement is extra-solution activity or a field-of-use, i.e., the extent to which (or how) the machine or apparatus imposes meaningful limits on the claim. Use of a machine that contributes only nominally or insignificantly to the execution of the claimed method (e.g., in a data gathering step or in a field-of-use limitation) would not integrate a judicial exception or provide significantly more." As explained in the 35 U.S.C. 101 rejection below, the limitation is a field-of-use limitation and does not impose meaningful limits on the claim. Thus, the machines are not particular machines and do not integrate the judicial exceptions into a practical application nor recites significantly more than the judicial exception.
Applicant further argues that "Applicant submits that amended claims 1, 7, and 13 are patent-eligible as they integrate a judicial exception in terms of improvement in functionality of the computer. (MPEP §§ 2106.04(d)(l) and 2106.0S(a)) i.e., a congeniality preserving Generative Adversarial Networks (cpGAN) leverages a supervisor neural network function to retain conditional temporal dynamics of an fully-observed data in the imputed temporal data." (See 35 U.S.C. 112(b) rejection for interpretation.) Examiner respectfully disagrees. MPEP 2106.05(a) states "An important consideration in determining whether a claim improves technology is the extent to which the claim covers a particular solution to a problem or a particular way to achieve a desired outcome, as opposed to merely claiming the idea of a solution or outcome. McRO, 837 F.3d at 1314-15, 120 USPQ2d at 1102-03; DDR Holdings, 773 F.3d at 1259, 113 USPQ2d at 1107. In this respect, the improvement consideration overlaps with other considerations, specifically the particular machine consideration (see MPEP § 2106.05(b)), and the mere instructions to apply an exception consideration (see MPEP § 2106.05(f)). Thus, evaluation of those other considerations may assist examiners in making a determination of whether a claim satisfies the improvement consideration." As explained in the 35 U.S.C. 101 rejection below, the cited limitation amounts to mere instructions to apply an exception. Additionally, the claim does not explain how the supervisor neural network retains conditional temporal dynamics, instead merely claiming the outcome of retaining conditional temporal dynamics. Thus, there is no improvement to technology and the claims do not integrate the judicial exceptions into a practical application nor recites significantly more than the judicial exception.
Applicant further argues that "Applicant believes that the subject matter of amended claims 1, 7, and 13 achieves significantly more in terms of minimizing a difference between the one or more imputed high=dimensional target feature embeddings of the input training dataset and the one or more predicted imputed high-dimensional target feature embeddings." As explained in the 35 U.S.C. 101 rejection below, the cited limitation is an abstract idea/judicial exception itself, and thus cannot achieve significantly more than the judicial exception.

Applicant’s arguments, see pages 17-21, filed 10/01/2025, with respect to the rejection(s) of claim(s) 1-3, 5-9, 11-15, and 17-18 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Guo (“A data imputation method for multivariate time series based on generative adversarial network”, 2019), Zhu (“A Second-Order Approach to Learning with Instance-Dependent Label Noise”, 2021), Dai (“Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems”, 2021), Koochali (“Probabalistic Forecasting of Sensory Data With Generative Adversarial Networks – ForGAN”, 2019), Weng (“Contrastive Representation Learning”, 2021), Tipirneni (“Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series”, February 2022), and Hu (“Semi-Supervised Learning Based on GAN With Mean and Variance Feature Matching”, 2019).

Claim Objections
Claims 1, 7, and 13 objected to because of the following informalities:
“an fully-observed data” in the seventh-to-last paragraph of each claim should be “fully-observed data”.
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-3, 5-9, 11-15, and 17-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “high-dimensional” in claims 1, 7, and 13 is a relative term which renders the claim indefinite. The term “high-dimensional” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The number of dimensions required to meet the requirements of the term "high-dimensional" is indefinite. For purposes of examination, any number of dimensions will be interpreted as “high-dimensional”.

Claims 1, 7, and 13 recite the term “a congeniality preserving Generative Adversarial Networks (cpGAN)”. cpGAN is not a term of art, not explicitly defined by the specification, and one of ordinary skill in the art would not be able to determine the metes and bounds of “a congeniality preserving Generative Adversarial Networks (cpGAN)”. For purposes of examination, any generative adversarial network (GAN) will be interpreted as a cpGAN.

Claims 1,7, and 13 recite the term “the cpGAN leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                    ) to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data”. Conditional temporal dynamics is not a term of art, not explicitly defined by the specification, and one of ordinary skill in the art would not be able to determine the metes and bounds of “conditional temporal dynamics” or whether a neural network function retains conditional temporal dynamics of fully observed data. For purposes of examination, the retention of conditional temporal dynamics of the data will be interpreted as inherent to a supervisor neural network function. Thus, for prior art purposes, any GAN with a supervisor neural network function will inherently retain conditional temporal dynamics of the fully-observed/training data in the imputed/predicted data.

Claims 1, 7, and 13 recites the limitation "the imputed temporal data" in the seventh-to-last paragraph of each claim. There is insufficient antecedent basis for this limitation in the claim.

Claims 1, 7, and 13 recite the limitation "minimizing, via the one or more hardware processors, a difference between the one or more imputed high-dimensional target feature embeddings of the input training dataset and the one or more predicted imputed high-dimensional target feature embeddings;" It is unclear what minimizing a difference entails. One interpretation is making the one or more target feature embeddings equal to the one or more predicted imputed high-dimensional target feature embeddings. Another interpretation is minimizing a loss-function. Therefore, this limitation is indefinite. For purposes of examination, this will be interpreted as minimizing a loss function involving high-dimensional target feature embeddings.

Claims 1, 7, and 13 recite the limitation “minimizing a first-order moment (            
                
                                D
                            
                                1
                            
                        -
                        
                                D
                            
                                2
                            
        ) and a second-order moment, (            
                |
                
                                    σ
                                
                                ^
                            
                            1
                        
                            2
                        
                -
                
                                    σ
                                
                                ^
                            
                            2
                        
                            2
                        
                |
            
        ) differences, defined between the fully-observed data (            
                
                                D
                            
                            ~
                        
                                        t
                                        r
                                        a
                                        i
                                        n
                                    
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        ) and an imputed data (            
                
                                D
                            
                            ^
                        
                        n
                        ,
                        1
                        :
                        
                                T
                            
                                n
                            
                )
            
        ”. It is unclear what is minimized: the first-order moment(s) and the second-order moment(s), the difference between the first-order of the fully-observed data and second-order moment for the fully-observed data and the difference between the first-order moment of the imputed data and second-order moment for the imputed data, the difference between the first-order moment of the fully-observed data and the first-order moment of the imputed data and the difference between the second-order moment of the fully-observed data and the second-order moment of the imputed data and/or any other combination. If a difference is minimized, it is again unclear what minimizing a difference entails. One interpretation is making the moments equal to each other, thus minimizing the difference. Another interpretation is minimizing a loss function involving the first-order moment and the second-order moment.  Therefore, this limitation is indefinite.  For purposes of examination, this limitation will be interpreted as minimizing a loss function involving a first-order moment (mean) and a second-order moment (variance).

Claims 1, 7, and 13 recite the limitation “mean for the fully-observed data (            
                
                                D
                            
                            ~
                        
                                        t
                                        r
                                        a
                                        i
                                        n
                                    
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        ) and the imputed data (            
                
                                D
                            
                            ^
                        
                        n
                        ,
                        1
                        :
                        
                                T
                            
                                n
                            
                )
            
         is computed by,             
                
                        D
                    
                        1
                    
                =
                
                        1
                    
                        N
                    
                        ∑
                        
                            j
                            =
                            1
                        
                            f
                        
                                ∑
                                
                                    n
                                    =
                                    1
                                
                                    N
                                
                                                        D
                                                    
                                                    ~
                                                
                                                        j
                                                    
                                                        t
                                                        r
                                                        a
                                                        i
                                                        n
                                                    
                                                n
                                                ,
                                                1
                                                :
                                                
                                                        T
                                                    
                                                        n
                                                    
                ∈
                
                        I
                    
                        (
                        f
                        )
                    
         and             
                
                        D
                    
                        2
                    
                =
                
                        1
                    
                        N
                    
                        ∑
                        
                            j
                            =
                            1
                        
                            f
                        
                                ∑
                                
                                    n
                                    =
                                    1
                                
                                    N
                                
                                                        D
                                                    
                                                    ^
                                                
                                                        j
                                                    
                                        n
                                        ,
                                        1
                                        :
                                        
                                                T
                                            
                                                n
                                            
                ∈
                
                        I
                    
                        (
                        f
                        )
                    
        , wherein underlying probability distributions of the input temporal data,             
                P
                
                                        D
                                    
                                    ~
                                
                                                t
                                                r
                                                a
                                                i
                                                n
                                            
                                        n
                                        ,
                                        1
                                        :
                                        
                                                T
                                            
                                                n
                                            
         is learned by minimizing (            
                
                        L
                    
                        U
                        S
                    
                ,
                 
                P
                
                                        D
                                    
                                    ^
                                
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        )”. The following variables are undefined, and thus render the limitation indefinite:             
                
                        D
                    
                        1
                    
                ,
                 
                N
                ,
                 
                f
                ,
                 
                N
                ,
                 
                I
                ,
                 
                        D
                    
                        2
                    
                ,
                 
                        L
                    
                        U
                        S
                    
                ,
                 
                P
                
                                        D
                                    
                                    ^
                                
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        .  It is unclear if there is one mean for both the fully-observed data and the imputed data or if there is one mean for each. It is unclear which equation corresponds to the mean for the fully-observed data and the imputed data, whether there is one or two means. For purposes of examination, this limitation will be treated as if there are two means and the variables will be treated as any quantity.

Claims 1, 7, and 13 recite the limitation "the input temporal data" in the second-to-last paragraph of each claim.  There is insufficient antecedent basis for this limitation in the claim.

Claims 1, 7, and 13 recite the limitation "imputing, with the cpGAN, a low-dimensional multivariate industrial time-series data in a digital twin, a simulation of an industry machine or an industrial plant, or a sensor, a production unit, or a manufacturing unit". A digital twin is a simulation of an industry machine, sensor, production unit, or manufacturing unit. It is unclear whether the listing of industry machine, sensor, production unit, or manufacturing unit are examples of digital twin simulations or if they are additional listings, meaning that data is imputed in an industrial plant, a sensor, a production unit, or a manufacturing unit. For purposes of examination, this will be interpreted as separate/additional listings.

Dependent claims 2-3, 5-6, 8-9, 11-12, 14-15, and 17-18 fail to resolve the issues and are rejected with the same rationales.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-3, 5-9, 11-15, and 17-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
MPEP § 2109(III) sets out steps for evaluating whether a claim is drawn to patent-eligible subject
matter. The analysis of claims 1-3, 5-9, 11-15, and 17-18, in accordance with these steps, follows.

Step 1 Analysis:
Step 1 is to determine whether the claim is directed to a statutory category (process, machine,
manufacture, or composition of matter). Claims 1-3 and 5-6 are directed to a process, claims 7-9 and 11-12 are directed to a machine, and claims 13-15 and 17-18 are directed to a manufacture. All claims are directed to statutory categories and analysis proceeds.

Step 2A Prong One, Step 2A Prong Two, and Step 2B Analysis:
Step 2A Prong One asks if the claim recites a judicial exception (abstract idea, law of nature, or natural phenomenon). If the claim recites a judicial exception, analysis proceeds to Step 2A Prong Two, which asks if the claim recites additional elements that integrate the abstract idea into a practical application. If the claim does not integrate the judicial exception, analysis proceeds to Step 2B, which asks if the claim amounts to significantly more than the judicial exception. If the claim does not amount to significantly more than the judicial exception, the claim is not eligible subject matter under 35 U.S.C. 101.
	None of the claims represent an improvement to technology.

Regarding claim 1, the following claim elements are abstract ideas:
transforming, … the cluster independent random noise by using the one or more associated cluster labels corresponding to the input training dataset to obtain a cluster dependent random noise; (One could practically transform data in the human mind given pen and paper. This is a mental process)
	 generating, … an imputed synthetic noise based on (i) a mask variable, (ii) one or more feature embeddings of the input training dataset, (iii) a flipped mask variable, and (iv) the obtained cluster dependent random noise; (One could practically generate imputed synthetic noise in the human mind given pen and paper, i.e. applying a mapping. This is a mental process.)
	generating, … one or more imputed high-dimensional feature embeddings using the generated imputed synthetic noise; (One could practically generate imputed high-dimensional feature embeddings in the human mind given pen and paper, i.e. applying a mapping. This is a mental process.)
	predicting, … one or more imputed high-dimensional target feature embeddings of the training dataset using the one or more imputed high-dimensional feature embeddings; (One could practically predict target feature embeddings in the human mind given pen and paper, i.e. applying a mapping. This is a mental process.)
	 generating, … one or more single-step ahead imputed high-dimensional feature embeddings using the one or more predicted imputed target high-dimensional feature embeddings, (One could practically generate single-step ahead imputed high-dimensional feature embeddings in the human mind given pen and paper, i.e. applying a mapping. This is a mental process.)
	 generating, … an imputed training data using the one or more single-step ahead imputed high- dimensional feature embeddings. (One could practically generate imputer training data in the human mind given pen and paper, i.e. applying a mapping. This is a mental process.)
	minimizing, via the one or more hardware processors, a difference between the one or more imputed high-dimensional target feature embeddings of the input training dataset and the one or more predicted imputed high-dimensional target feature embeddings; (See 112(b) rejection for interpretation. Minimizing a loss function is a mathematical calculation, which are mathematical concepts.)
	minimizing a first-order moment (            
                
                                D
                            
                                1
                            
                        -
                        
                                D
                            
                                2
                            
        ) and a second-order moment, (            
                |
                
                                    σ
                                
                                ^
                            
                            1
                        
                            2
                        
                -
                
                                    σ
                                
                                ^
                            
                            2
                        
                            2
                        
                |
            
        ) differences, defined between the fully-observed data (            
                
                                D
                            
                            ~
                        
                                        t
                                        r
                                        a
                                        i
                                        n
                                    
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        ) and an imputed data (            
                
                                D
                            
                            ^
                        
                        n
                        ,
                        1
                        :
                        
                                T
                            
                                n
                            
                )
            
        , and mean for the fully-observed data (            
                
                                D
                            
                            ~
                        
                                        t
                                        r
                                        a
                                        i
                                        n
                                    
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        ) and the imputed data (            
                
                                D
                            
                            ^
                        
                        n
                        ,
                        1
                        :
                        
                                T
                            
                                n
                            
                )
            
         is computed by, (See 112(b) rejection for interpretation. Minimizing a loss function is a mathematical calculation, which are mathematical concepts.)
            
                        D
                    
                        1
                    
                =
                
                        1
                    
                        N
                    
                        ∑
                        
                            j
                            =
                            1
                        
                            f
                        
                                ∑
                                
                                    n
                                    =
                                    1
                                
                                    N
                                
                                                        D
                                                    
                                                    ~
                                                
                                                        j
                                                    
                                                        t
                                                        r
                                                        a
                                                        i
                                                        n
                                                    
                                                n
                                                ,
                                                1
                                                :
                                                
                                                        T
                                                    
                                                        n
                                                    
                ∈
                
                        I
                    
                        (
                        f
                        )
                    
         and             
                
                        D
                    
                        2
                    
                =
                
                        1
                    
                        N
                    
                        ∑
                        
                            j
                            =
                            1
                        
                            f
                        
                                ∑
                                
                                    n
                                    =
                                    1
                                
                                    N
                                
                                                        D
                                                    
                                                    ^
                                                
                                                        j
                                                    
                                        n
                                        ,
                                        1
                                        :
                                        
                                                T
                                            
                                                n
                                            
                ∈
                
                        I
                    
                        (
                        f
                        )
                    
        , (See 112(b) rejection for interpretation. These are mathematical equations, which are mathematical concepts.)
wherein underlying probability distributions of the input temporal data,             
                P
                
                                        D
                                    
                                    ~
                                
                                                t
                                                r
                                                a
                                                i
                                                n
                                            
                                        n
                                        ,
                                        1
                                        :
                                        
                                                T
                                            
                                                n
                                            
         is learned by minimizing (            
                
                        L
                    
                        U
                        S
                    
                ,
                 
                P
                
                                        D
                                    
                                    ^
                                
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        ) (See 112(b) rejection for interpretation. Minimizing a loss function is a mathematical calculation, which are mathematical concepts.)
imputing, …, a low-dimensional multivariate industrial time-series data (One could practically impute data in the human mind. This is a mental process.)
The following claim elements are additional elements which, taken alone or in combination with
the other additional elements, do not integrate the judicial exception into a practical application nor
amount to significantly more than the judicial exception:
	A processor implemented method, comprising: (This recites a generic computer part with a generic computer function. This is mere instructions to apply an exception. See MPEP § 2106.05(f).)
obtaining, via one or more hardware processors, an input training dataset, a cluster independent random noise and one or more associated cluster labels corresponding to the input training dataset; (Obtaining data is the known process of receiving data; this is mere instructions to apply an exception.)
via one or more hardware processors, (This recites a generic computer part used to implement abstract ideas; this is mere instructions to apply an exception.)
upon invoking a supervisor module comprised in a congeniality preserving Generative Adverial Networks (cpGAN), and the cpGAN leverages a supervisor neural network function (            
                
                        S
                    
                        c
                        p
                        G
                        A
                        N
                    
                )
            
         to retain conditional temporal dynamics of an fully-observed data (            
                
                                D
                            
                            ~
                        
                                        t
                                        r
                                        a
                                        i
                                        n
                                    
                                n
                                ,
                                1
                                :
                                
                                        T
                                    
                                        n
                                    
        ) in the imputed temporal data; (See 112(b) rejection for interpretation. The BRI of a supervisor module is any software module, which amounts to mere instructions to apply an exception. A supervisor neural network function, interpreted as a self-supervised neural network, is a known process/component in machine learning. This amounts to mere instructions to apply an exception. A GAN is a known process/component in machine learning, which amounts to mere instructions to apply an exception.)
with the cpGAN (A GAN is a known process/component in machine learning, which amounts to mere instructions to apply an exception.)
in a digital twin, a simulation of an industry machine or an industrial plant, or a sensor, a production unit, or a manufacturing unit. (This generally links the use of the judicial exceptions to the particular field of industry/manufacturing. This is a field of use limitation.)

Regarding claim 2, the rejection of claim 1 is incorporated herein. Further, claim 2 recites the following abstract idea:
wherein the cluster dependent random noise is obtained from the cluster independent random noise that is sampled from a Gaussian distribution and the one or more associated cluster labels corresponding to the input training dataset. (Again, one could practically transform the cluster independent random noise in the human mind given pen and paper. One could also sample a Gaussian distribution to obtain cluster independent random noise and use the corresponding cluster labels to transform the cluster independent random noise in the human mind with pen and paper. This is a mental process.)
Claim 2 does not recite any additional elements.

Regarding claim 3, the rejection of claim 1 is incorporated herein. Further, claim 3 recites the following abstract idea:
wherein the flipped mask variable is obtained based on a difference between a pre-defined value and the mask variable. (One could practically obtain a flipped mask based on the difference between a pre-defined value and the mask variable in the human mind with the aid of pen and paper. This is a mental process.)
Claim 3 does not recite any additional elements.

Regarding claim 5, the rejection of claim 1 is incorporated herein. Further, claim 5 recites the following abstract idea:
further comprising validating the imputed training data based on a comparison of the imputed training data and the input training dataset. (One could practically compare data in the human mind and validate it. This is the mental process of evaluation.)

Regarding claim 6, the rejection of claim 1 is incorporated herein. Further, claim 6 recites the following abstract idea:
further comprising classifying the one or more imputed high-dimensional feature embeddings into at least one class type. (One could practically classify embeddings into types in the human mind. This is a mental process.)

Regarding claim 7, the following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
A system, comprising: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: (All components recited are generic computer parts, configured in a generic way, and the processor does the known process of carrying out instructions. This is mere instructions to apply an exception.)
… by using a generator module comprised in the cpGAN … (Programming modules are a known computer process. This is mere instructions to apply an exception.)
	… by using a critic module comprised in the cpGAN … (Programming modules are a known computer process. This is mere instructions to apply an exception.)
	… by using a supervisor module comprised in the cpGAN … (Programming modules are a known computer process. This is mere instructions to apply an exception.)
… by using a recovery module comprised in the cpGAN … (Programming modules are a known computer process. This is mere instructions to apply an exception.)
The remainder of claim 7 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 8-9 and 11-12 recite substantially similar subject matter to claims 2-3 and 5-6 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 13, the following claim elements are additional elements which, taken alone or in combination with the other additional elements, do not integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception:
One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause: (This recites generic computer components and functions; this is mere instructions to apply an exception.)
The remainder of claim 13 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 14-15 and 17-18 recite substantially similar subject matter to claims 2-3 and 5-6 respectively and are rejected with the same rationale, mutatis mutandis.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 1-3, 5, 7-9, 11, 13-15, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo (“A data imputation method for multivariate time series based on generative adversarial network”, 2019), Zhu (“A Second-Order Approach to Learning with Instance-Dependent Label Noise”, 2021), Dai (“Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems”, 2021), Koochali (“Probabalistic Forecasting of Sensory Data With Generative Adversarial Networks – ForGAN”, 2019), Weng (“Contrastive Representation Learning”, 2021), Tipirneni (“Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series”, February 2022), and Hu (“Semi-Supervised Learning Based on GAN With Mean and Variance Feature Matching”, 2019).

Regarding claim 1, Guo teaches
A processor implemented method, comprising: (Page 194 states "In this paper, all of the networks in the experiment are implemented based on Tensorflow [57] framework." Tensorflow is a programming framework implemented on a computer using a processor. Hereinafter, this is considered to be the explanation for “via one or more hardware processors”.)
obtaining, via one or more hardware processors, an input training dataset, a cluster independent random noise and … ; (Fig. 4 shows the original sample                         
                            x
                        
                    , interpreted as the input training dataset, and latent random variable                         
                            z
                        
                    , interpreted as the cluster independent random noise.)
generating, one or more imputed high-dimensional feature embeddings using [the cluster independent random noise]; (See 112(b) rejection for “high-dimensional” interpretation. Page 187, section 3.2 states "It takes                         
                            d
                        
                    -dimensional latent random vector                         
                            z
                             
                            ~
                             
                            U
                            (
                            
                                            0,1
                                        
                                    d
                                
                            )
                        
                     following the uniform distribution as the input and performs fractional-strided convolutions (deconvolutions) within each single channel separately." Page 187, section 3,2 further states "Compared with DC-GAN which adopts 2-D convolutions, MTSGAN performs 1-D kernel based strided convolutions or 1-D kernel based fractional-strided convolutions in each single channel to extract better features from MTS." As the convolutions extract features, the output generated is a feature embedding. Section 4.1, page 192 states "For the imputation of                        
                             
                                    X
                                
                                    i
                                    n
                                    c
                                
                    , we still employ back-propagation to find the “closest” latent encoding of                         
                            
                                    X
                                
                                    i
                                    n
                                    c
                                
                    , and then use the samples generated by the generator to impute the missing values." Therefore, the embeddings are for imputation during training.)
predicting, via one or more hardware processors, one or more imputed high-dimensional target feature embeddings of the training dataset using the one or more predicted imputed high-dimensional target feature embeddings, … a congeniality preserving Generative Adversarial Networks (cpGAN), and the cpGAN … (As can be seen in Fig. 4, the training data set input into the discriminator, which, as one of ordinary skill in the art would understand, learns to classifies the embeddings into real or fake, with the loss function discouraging “fake” embeddings. This makes the feature embeddings predicted in the next iteration of the generator closer to the training dataset. Therefore, target feature embeddings of the training dataset are predicted using the feature embeddings. These are also for imputation as stated by Section 4.1, page 192 above. This step is also done while training. MTS-GAN is interpreted as the cpGAN.)
	 generating, via one or more hardware processors, one or more … imputed high-dimensional feature embeddings using the one or more imputed high-dimensional feature embeddings; and (Section 4.1, page 192 states that the train the MTS-GAN then, "For the imputation of                        
                             
                                    X
                                
                                    i
                                    n
                                    c
                                
                    , we still employ back-propagation to find the “closest” latent encoding of                         
                            
                                    X
                                
                                    i
                                    n
                                    c
                                
                    , and then use the samples generated by the generator to impute the missing values." This closest latent encoding, retrieved after training, is what is generated by this step. As it is based on the trained model, which used the last high-dimensional feature embeddings for training, this step uses the last high-dimensional feature embeddings.)
	 generating, via one or more hardware processors, an imputed training data using the one or more … imputed high- dimensional feature embeddings. (Section 4.1, page 192 states that the train the MTS-GAN then, "For the imputation of                        
                             
                                    X
                                
                                    i
                                    n
                                    c
                                
                    , we still employ back-propagation to find the “closest” latent encoding of                         
                            
                                    X
                                
                                    i
                                    n
                                    c
                                
                    , and then use the samples generated by the generator to impute the missing values." The samples generated is the imputed training data using the last high-dimensional feature embeddings.) minimizing, via the one or more hardware processors, a difference between the one or more imputed high-dimensional target feature embeddings of the input training dataset and the one or more predicted imputed high-dimensional target feature embeddings; (See 112(b) rejection for interpretation. Minimizing a loss function is a mathematical calculation, which are mathematical concepts.) 
the one or more predicted imputed high-dimensional target feature embeddings. (As can be seen in Fig. 4, the training data set input into the discriminator, which, as one of ordinary skill in the art would understand, learns to classifies the embeddings into real or fake, with the loss function discouraging “fake” embeddings. This makes the feature embeddings predicted in the next iteration of the generator closer to the training dataset. Therefore, target feature embeddings of the training dataset are predicted using the feature embeddings. These are also for imputation as stated by Section 4.1, page 192 above. This step is also done while training.)
wherein underlying probability distributions of the input temporal data,                         
                            P
                            
                                                    D
                                                
                                                ~
                                            
                                                            t
                                                            r
                                                            a
                                                            i
                                                            n
                                                        
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                     is learned by minimizing (                        
                            
                                    L
                                
                                    U
                                    S
                                
                            ,
                             
                            P
                            
                                                    D
                                                
                                                ^
                                            
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) (See 112(b) rejection for interpretation. Page 187 states "The generator G receives the latent random vector z sampled from a normal distribution or uniform distribution p and outputs a synthetic sample                         
                            G
                            (
                            z
                            )
                        
                    . The discriminator                         
                            D
                        
                     takes either a training sample                         
                            x
                             
                    or a synthetic sample                         
                            G
                            (
                            z
                            )
                        
                     as input, and outputs a scalar indicating the probability that                         
                            x
                        
                     or                         
                            G
                            (
                            z
                            )
                        
                     follows the original data distribution                         
                            
                                    p
                                
                                    d
                                    a
                                    t
                                    a
                                
                    . In the training stage, the generator tries to fool the discriminator as much as possible by solving the following optimization problem                         
                            
                                    max
                                
                                    G
                                
                                    E
                                
                                    z
                                    ~
                                     
                                            p
                                        
                                            z
                                        
                                            z
                                        
                            [
                            l
                            o
                            g
                            D
                            
                                    G
                                    
                                            z
                                        
                            ]
                        
                     (1) and in the meantime, the discriminator tries to distinguish between original samples and the synthetic samples by                         
                            
                                    max
                                
                                    D
                                
                                    E
                                
                                    x
                                    ~
                                    
                                            P
                                        
                                            d
                                            a
                                            t
                                            a
                                        
                                            x
                                        
                                    l
                                    o
                                    g
                                    D
                                    
                                            x
                                        
                            +
                            
                                    max
                                
                                    G
                                
                                    E
                                
                                    z
                                    ~
                                     
                                            p
                                        
                                            z
                                        
                                            z
                                        
                            [
                            l
                            o
                            g
                            (
                            1
                            -
                            D
                            
                                    G
                                    
                                            z
                                        
                            )
                            ]
                        
                     (2) After alternating the training of                         
                            G
                        
                     and                         
                            D
                        
                    , GAN can capture the distribution of the training data by making the distribution of its outputs (a group of synthetic data) approximate original data distribution." One of ordinary skill in the art would realize that maximizing a function is the equivalent to minimizing the negative of the function. Therefore, the optimizations minimize the negatives of the functions. As the training involves the optimization problems and results in an imitation of probability distributions of the training data, the underlying probability distributions are learned by the minimization of the function. Page 193, Section 4.2.1 lists datasets used, which are temporal data.)
imputing, with a cpGAN, a low-dimensional multivariate industrial time-series data in a digital twin, a simulation of an industry machine or an industrial plant, or a sensor, a production unit, or a manufacturing unit. (Page 194 states "Table 1 shows the imputation results of the five existing approaches and the proposed MTS-GAN based method tested on the four datasets under the two different missing cases (i.e. random missing and continuous missing cases) when the missing rate in- terval [ p 1 %, p 2 %] is set to [10%, 90%]. It can be seen from Table 1 that the MTS-GAN based imputation method achieves the best re- construction accuracy compared with other approaches under both the random missing and continuous missing cases." Page 193 lists the datasets, including "3) Tennessee Eastman Process (TEP) based dataset: TEP model [56] is used to generate the dataset, which is built from an actual chemical process by Downs and Vogel in 1993, and is a famous industrial process simulation platform in the fields of process control, process monitoring and fault diagnosis. … The dataset contains 3200 training samples and 800 testing samples in total, and each sample is a 10-dimensional time series of length 50." Therefore, the low-dimensional multivariate industrial time-series data is imputed. This data set is from a digital twin, which is a simulation of an industry machine or plant. Page 192 also lists the following dataset: "4) Point machine dataset: A point machine is a device for operating railway turnouts to determine the heading direction of a train when facing railway crossings. Since the operating currents are important features for fault detection of point ma- chine operations, historical dataset collected from a real-world high-speed railway in China, which is composed of three-phase operating currents of point machines, is used as the dataset in this subsection. In this dataset, there are a total of 9000 training samples and 1000 testing samples, and each of them is a 3-dimensional time series of length 140." This dataset is of a production/manufacturing unit, which as one of ordinary skill in the art would understand, have sensors. Therefore, the data is also sensor data.)
	Guo does not appear to teach
	upon invoking a supervisor module comprised in [a neural network architecture] leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                            )
                        
                     to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data;
	[obtaining] one or more associated cluster labels corresponding to the input training dataset
	transforming, via one or more hardware processors, the cluster independent random noise by using the associated cluster labels corresponding to the input training dataset to obtain a cluster dependent random noise;
	 generating, via one or more hardware processors, an imputed synthetic noise based on (i) a mask variable, (ii) … the training dataset, (iii) a flipped mask variable, and (iv) the one or more associated cluster labels corresponding to the input training dataset;
	[doing imputation] using the generated imputed synthetic noise
	single-step ahead [imputation]
	minimizing a difference between one or more target feature embeddings of the input training dataset and [predicted embeddings]
	minimizing a first-order moment (                        
                            
                                            D
                                        
                                            1
                                        
                                    -
                                    
                                            D
                                        
                                            2
                                        
                    ) and a second-order moment, (                        
                            |
                            
                                                σ
                                            
                                            ^
                                        
                                        1
                                    
                                        2
                                    
                            -
                            
                                                σ
                                            
                                            ^
                                        
                                        2
                                    
                                        2
                                    
                            |
                        
                    ) differences, defined between the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and an imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                    , and mean for the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and the imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                     is computed by, 
                        
                                    D
                                
                                    1
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ~
                                                            
                                                                    j
                                                                
                                                                    t
                                                                    r
                                                                    a
                                                                    i
                                                                    n
                                                                
                                                            n
                                                            ,
                                                            1
                                                            :
                                                            
                                                                    T
                                                                
                                                                    n
                                                                
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                     and                         
                            
                                    D
                                
                                    2
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ^
                                                            
                                                                    j
                                                                
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                    , 
	However, Zhu—directed to analogous art—teaches
	[obtaining] one or more associated cluster labels corresponding to the input training dataset (Page 2 states "This paper targets on a classification problem given a set of                         
                            N
                        
                     training examples with Instance-Dependent label Noise (IDN) denoted by                         
                            
                                    D
                                
                                ~
                            
                            ≔
                            
                                                            x
                                                        
                                                            n
                                                        
                                                    ,
                                                    
                                                                    y
                                                                
                                                                ~
                                                            
                                                            n
                                                        
                                    n
                                    ∈
                                    
                                            N
                                        
                    , where                         
                            
                                    N
                                
                            ≔
                            {
                            1,2
                            ,
                             
                            …
                            ,
                             
                            N
                            }
                        
                      is the set of indices. The corresponding noisy data distribution is denoted by                         
                            
                                    D
                                
                                ~
                            
                    . Examples                         
                            (
                            
                                    x
                                
                                    n
                                
                            ,
                            
                                            y
                                        
                                        ~
                                    
                                    n
                                
                            )
                        
                     are drawn according to random variables                         
                            
                                    X
                                    ,
                                    
                                            Y
                                        
                                        ~
                                    
                            ~
                             
                                    D
                                
                                ~
                            
                    ." The training example labels                         
                            
                                    y
                                
                                ~
                            
                     are interpreted as cluster labels as they describe a group of training examples.)
	transforming, via one or more hardware processors, the cluster independent random noise by using the associated cluster labels corresponding to the input training dataset to obtain a cluster dependent random noise; (Page 4 states "Theorem 1 shows the ratio of wrong predictions given by                         
                            
                                            f
                                        
                                        ~
                                    
                                    p
                                    e
                                    e
                                    r
                                
                                    *
                                
                     includes two components. The former term                         
                            
                                    2
                                    
                                                    ϵ
                                                
                                                    +
                                                
                                            +
                                            
                                                    ϵ
                                                
                                                    -
                                                
                                    1
                                    -
                                    
                                            ϵ
                                        
                                            +
                                        
                                    -
                                    
                                            ϵ
                                        
                                            -
                                        
                     is directly caused by IDN [instance dependent noise]” The instance dependent noise is interpreted as the cluster independent random noise. Page 4 further states "Our second observation is that if we find a way to compensate for the “imbalances” caused by the down-weighting effects shown above, the challenging instance-dependent label noise could be transformed into a class-dependent one, which existing techniques can then handle." Page 5 states "Theorem 2 effectively divides the instance-dependent la bel noise into two parts. As shown in Eq. (2), the first line is the same as Eq. (1) in Lemma 1, indicating the average effect of instance-dependent label noise can be treated as a class-dependent one with parameters                         
                            
                                    e
                                
                                    +
                                
                            ,
                             
                                    e
                                
                                    -
                                
                    ." The labels are used in the Eq. 2, so the cluster dependent random noise is found using the labels.)
	[generating synthetic noise] by using the associated cluster labels corresponding to the input training dataset to obtain a cluster dependent random noise; (Page 5 states "Theorem 2 effectively divides the instance-dependent la bel noise into two parts. As shown in Eq. (2), the first line is the same as Eq. (1) in Lemma 1, indicating the average effect of instance-dependent label noise can be treated as a class-dependent one with parameters                         
                            
                                    e
                                
                                    +
                                
                            ,
                             
                                    e
                                
                                    -
                                
                    ." The labels are used in the Eq. 2, so the cluster dependent random noise is found using the labels.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo and Zhu because as Zhu states on page 1, "With limited budgets/efforts, the resulting dataset would be noisy, and the existence of label noise may mislead DNNs to learn or memorize wrong correlations [10, 11, 35, 38, 47]. To make it worse, the label noise embedded in human annotations is often instance-dependent, e.g., some difficult examples are more prone to be mislabeled [34]. This hidden and imbalanced distribution of noise often has a detrimental effect on the training outcome [15, 23]. It remains an important and challenging task to learn with instance-dependent label noise."
	The combination of Guo and Zhu does not appear to explicitly teach
	 generating, via one or more hardware processors, an imputed synthetic noise based on (i) a mask variable, (ii) … the training dataset, (iii) a flipped mask variable, and 
upon invoking a supervisor module comprised in [a neural network architecture] leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                            )
                        
                     to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data;
	[doing imputation] using the generated imputed synthetic noise
	single-step ahead [imputation]
minimizing a difference between one or more target feature embeddings of the input training dataset and [predicted embeddings]
minimizing a first-order moment (                        
                            
                                            D
                                        
                                            1
                                        
                                    -
                                    
                                            D
                                        
                                            2
                                        
                    ) and a second-order moment, (                        
                            |
                            
                                                σ
                                            
                                            ^
                                        
                                        1
                                    
                                        2
                                    
                            -
                            
                                                σ
                                            
                                            ^
                                        
                                        2
                                    
                                        2
                                    
                            |
                        
                    ) differences, defined between the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and an imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                    , and mean for the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and the imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                     is computed by, 
                        
                                    D
                                
                                    1
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ~
                                                            
                                                                    j
                                                                
                                                                    t
                                                                    r
                                                                    a
                                                                    i
                                                                    n
                                                                
                                                            n
                                                            ,
                                                            1
                                                            :
                                                            
                                                                    T
                                                                
                                                                    n
                                                                
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                     and                         
                            
                                    D
                                
                                    2
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ^
                                                            
                                                                    j
                                                                
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                    , 
	However, Dai—directed to analogous art—teaches
	generating, via one or more hardware processors, an imputed synthetic noise based on (i) a mask variable, (ii) … the training dataset, (iii) a flipped mask variable, and (iv) the [random noise]; (Section IIA on pages 792-793 states "Here the basic idea is to replace the values of in the covariate set mis(                        
                            k
                        
                    ) with a random noise, then feed this partially-true noisy data into the generator to obtain a high-quality imputation. Specifically, for MI-GAN1, the generator entails two steps:                          
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                                    x
                                    ,
                                    z
                                    ,
                                    
                                            m
                                        
                                            k
                                        
                            =
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                            (
                            x
                             
                                ⨀
                                
                                            m
                                        
                                            k
                                        
                            +
                             
                            z
                             
                                ⨀
                                
                                            1
                                            -
                                            
                                                    m
                                                
                                                    k
                                                
                            )
                        
                     is vector of length                         
                            p
                        
                    , where                         
                            ⨀
                        
                     denotes the element-wise multiplication and                         
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                     is the generator network that outputs a value for every covariate, even its value was observed.”                         
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                            (
                            x
                             
                                ⨀
                                
                                            m
                                        
                                            k
                                        
                            +
                             
                            z
                             
                                ⨀
                                
                                            1
                                            -
                                            
                                                    m
                                                
                                                    k
                                                
                            )
                        
                     is interpreted as the imputed synthetic noise, as this is the noisy input to the generator.                         
                            
                                    m
                                
                                    k
                                
                     is the mask variable and                         
                            (
                            1
                            -
                            
                                    m
                                
                                    k
                                
                            )
                        
                     is the flipped mask variable, as                         
                            
                                    m
                                
                                    k
                                
                     is represented as ones and zeroes and                         
                            1
                            -
                            1
                            =
                            0
                        
                     and                         
                            1
                            -
                            0
                            =
                            1
                        
                    , resulting in a flipped mask when the matrix is subtracted from 1.                         
                            z
                        
                     is the random noise.                         
                            x
                        
                     is the training data.)
[doing imputation] using the generated imputed synthetic noise (As stated above, Section IIA states that the generator obtains a high quality imputation using the step listed above, where the synthetic noise is generated.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo, Zhu, and Dai because as stated by Dai on page 792, "In this paper, we propose two novel GAN-based multiple imputation methods, namely MI-GAN1 and MI-GAN2, which can work for high dimensional blockwise pattern of missing data with a moderate sample size. We highlight that MI-GAN1 is equipped with theoretical guarantees under the MAR mechanism."
The combination of Luo, Zhu, and Dai do not appear to explicitly teach
upon invoking a supervisor module comprised in [a neural network architecture] leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                            )
                        
                     to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data;
minimizing a difference between one or more target feature embeddings of the input training dataset and [predicted embeddings]
single-step ahead [forecasting]
minimizing a first-order moment (                        
                            
                                            D
                                        
                                            1
                                        
                                    -
                                    
                                            D
                                        
                                            2
                                        
                    ) and a second-order moment, (                        
                            |
                            
                                                σ
                                            
                                            ^
                                        
                                        1
                                    
                                        2
                                    
                            -
                            
                                                σ
                                            
                                            ^
                                        
                                        2
                                    
                                        2
                                    
                            |
                        
                    ) differences, defined between the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and an imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                    , and mean for the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and the imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                     is computed by, 
                        
                                    D
                                
                                    1
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ~
                                                            
                                                                    j
                                                                
                                                                    t
                                                                    r
                                                                    a
                                                                    i
                                                                    n
                                                                
                                                            n
                                                            ,
                                                            1
                                                            :
                                                            
                                                                    T
                                                                
                                                                    n
                                                                
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                     and                         
                            
                                    D
                                
                                    2
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ^
                                                            
                                                                    j
                                                                
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                    , 
However, Koochali—directed to analogous art—teaches
single-step ahead [forecasting] (Page 63871 states "In this paper we aim to model the probability distribution of one step ahead value                         
                            
                                    x
                                
                                    t
                                    +
                                    1
                                
                     given the historical data                         
                            c
                            =
                            {
                            
                                    x
                                
                                    0
                                
                            ,
                             
                            …
                            ,
                             
                                    x
                                
                                    t
                                
                            }
                        
                    , i.e.                         
                            ρ
                            (
                            
                                    x
                                
                                    t
                                    +
                                    1
                                
                            |
                            c
                            )
                        
                    .")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo, Zhu, and Dai with the teachings of Koochali because, as Koochali states on page 63870, "Our method can learn the full conditional probability distribution of future values even in complex situations without facing conventional problems of probabilistic forecasting methods such as quantile crossing or dependency on the chosen prior distribution."
The combination of Guo, Zhu, Dai, and Koochali does not appear to explicitly teach
upon invoking a supervisor module comprised in [a neural network architecture] leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                            )
                        
                     to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data;
minimizing a difference between one or more target feature embeddings of the input training dataset and [predicted embeddings]
However, Weng—directed to analogous art—teaches 
minimizing a difference between one or more target feature embeddings of the input training dataset and [predicted embeddings] (Page 1 states “Given a list of input samples                         
                            
                                            x
                                        
                                            i
                                        
                    , each has a corresponding label                         
                            
                                    y
                                
                                    i
                                
                            ∈
                            {
                            1
                            ,
                            …
                            ,
                            L
                            }
                        
                     among                         
                            L
                        
                     classes. We would like to learn a function                         
                            
                                    f
                                
                                    θ
                                
                                    .
                                
                            :
                            X
                            →
                            
                                    R
                                
                                    d
                                
                     that encodes                         
                            
                                    x
                                
                                    i
                                
                     into an embedding vector such that examples from the same class have similar embeddings and samples from different classes have very different ones. Thus, contrastive loss takes a pair of inputs                         
                            (
                            
                                    x
                                
                                    i
                                
                            ,
                             
                                    x
                                
                                    j
                                
                            )
                        
                     and minimizes the embedding distance when they are from the same class but maximizes the distance otherwise.” Page 1 also states a loss function, which one of ordinary skill would realize is used in training the encoder. The loss function minimizes the distance between a predicted embedding (output of the learned function,                         
                            
                                    f
                                
                                    θ
                                
                                    .
                                
                            :
                            X
                            →
                            
                                    R
                                
                                    d
                                
                    ) and the training data,                         
                            
                                            x
                                        
                                            i
                                        
                    .)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo, Zhu, Dai, and Koochali with the teachings of Weng because, as Weng states on page 1, "The goal of contrastive representation learning is to learn such an embedding space in which similar sample pairs stay close to each other while dissimilar ones are far apart. Contrastive learning can be applied to both supervised and unsupervised settings. When working with unsupervised data, contrastive learning is one of the most powerful approaches in self-supervised learning." One of ordinary skill in the art would realize that, to learn an embedding, contrastive loss is used.
The combination of Guo, Zhu, Dai, Koochali, and Weng does not appear to explicitly teach
upon invoking a supervisor module comprised in [a neural network architecture] leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                            )
                        
                     to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data;
minimizing a first-order moment (                        
                            
                                            D
                                        
                                            1
                                        
                                    -
                                    
                                            D
                                        
                                            2
                                        
                    ) and a second-order moment, (                        
                            |
                            
                                                σ
                                            
                                            ^
                                        
                                        1
                                    
                                        2
                                    
                            -
                            
                                                σ
                                            
                                            ^
                                        
                                        2
                                    
                                        2
                                    
                            |
                        
                    ) differences, defined between the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and an imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                    , and mean for the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and the imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                     is computed by, 
                        
                                    D
                                
                                    1
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ~
                                                            
                                                                    j
                                                                
                                                                    t
                                                                    r
                                                                    a
                                                                    i
                                                                    n
                                                                
                                                            n
                                                            ,
                                                            1
                                                            :
                                                            
                                                                    T
                                                                
                                                                    n
                                                                
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                     and                         
                            
                                    D
                                
                                    2
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ^
                                                            
                                                                    j
                                                                
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                    , 
However, Tipirneni—directed to analogous art teaches
upon invoking a supervisor module comprised in [a neural network architecture] leverages a supervisor neural network function (                        
                            
                                    S
                                
                                    c
                                    p
                                    G
                                    A
                                    N
                                
                            )
                        
                     to retain conditional temporal dynamics of an fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) in the imputed temporal data; (See 112(b) rejection for interpretation. Page 9 states "3.2.6 Self-supervision. We experimented with both masking and forecasting as pretext tasks for providing self-supervision and found that forecasting improved the results on target tasks. The forecasting task uses the same architecture as the target task except for the prediction layer i.e.,                         
                            
                                    z
                                
                                ~
                            
                            =
                            
                                    W
                                
                                    s
                                
                                            e
                                        
                                            d
                                        
                                    ⋅
                                    
                                            e
                                        
                                            T
                                        
                            +
                            
                                    b
                                
                                    s
                                
                            ∈
                            
                                    R
                                
                                            F
                                        
                                    9
                                
                            .
                        
                    ". This is interpreted as the supervisor module, and the architecture function is interpreted as the supervisor neural network function, which retains conditional temporal dynamics, as the input/training data is time-series (see page 5, Section 3.1). Page 5, Section 3.1 states "The forecast mask is necessary because the unobserved forecasts cannot be used in training and are hence masked out in the loss function." Thus, the data used in the supervisor neural network function is fully-observed. As the data is forecasted, it is imputed.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo, Zhu, Dai, Koochali, and Weng with the teachings of Tipirneni because, as stated by Tipirneni on pages 4-5, "Supervised deep learning models often rely on large amounts of labeled data to learn generalized and robust representations. Limited labeled data can make the model easily overfit to training data and make the model more sensitive to noise. Since labeled data is expensive to obtain, self-supervised learning was introduced as a technique to solve this challenge. This technique trains the model on carefully constructed proxy tasks that improve the model’s performance on target prediction tasks."
The combination of Guo, Zhu, Dai, Koochali, Weng, and Tipirneni does not appear to explicitly teach
minimizing a first-order moment (                        
                            
                                            D
                                        
                                            1
                                        
                                    -
                                    
                                            D
                                        
                                            2
                                        
                    ) and a second-order moment, (                        
                            |
                            
                                                σ
                                            
                                            ^
                                        
                                        1
                                    
                                        2
                                    
                            -
                            
                                                σ
                                            
                                            ^
                                        
                                        2
                                    
                                        2
                                    
                            |
                        
                    ) differences, defined between the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and an imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                    , and mean for the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and the imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                     is computed by, 
                        
                                    D
                                
                                    1
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ~
                                                            
                                                                    j
                                                                
                                                                    t
                                                                    r
                                                                    a
                                                                    i
                                                                    n
                                                                
                                                            n
                                                            ,
                                                            1
                                                            :
                                                            
                                                                    T
                                                                
                                                                    n
                                                                
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                     and                         
                            
                                    D
                                
                                    2
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ^
                                                            
                                                                    j
                                                                
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                            ∈
                            
                                    I
                                
                                    (
                                    f
                                    )
                                
                    , 
However, Hu—directed to analogous art—teaches
minimizing a first-order moment (                        
                            
                                            D
                                        
                                            1
                                        
                                    -
                                    
                                            D
                                        
                                            2
                                        
                    ) and a second-order moment, (                        
                            
                                                        σ
                                                    
                                                    ^
                                                
                                                1
                                            
                                                2
                                            
                                    -
                                    
                                                        σ
                                                    
                                                    ^
                                                
                                                2
                                            
                                                2
                                            
                    ) differences, defined between the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and an imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                    , and mean for the fully-observed data (                        
                            
                                            D
                                        
                                        ~
                                    
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                    ) and the imputed data (                        
                            
                                            D
                                        
                                        ^
                                    
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                            )
                        
                     is computed by,                         
                            
                                    D
                                
                                    1
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ~
                                                            
                                                                    j
                                                                
                                                                    t
                                                                    r
                                                                    a
                                                                    i
                                                                    n
                                                                
                                                            n
                                                            ,
                                                            1
                                                            :
                                                            
                                                                    T
                                                                
                                                                    n
                                                                
                            ∈
                            
                                    I
                                
                                            f
                                        
                     and                         
                            
                                    D
                                
                                    2
                                
                            =
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            ∑
                                            
                                                n
                                                =
                                                1
                                            
                                                N
                                            
                                                                    D
                                                                
                                                                ^
                                                            
                                                                    j
                                                                
                                                    n
                                                    ,
                                                    1
                                                    :
                                                    
                                                            T
                                                        
                                                            n
                                                        
                            ∈
                            
                                    I
                                
                                            f
                                        
                    , (Page 541 states "To improve the instability of GANs, FM aims to prevent the generator from over training on the discriminator by assigning a novel objective to the generator. Different from the objective of vanilla GAN by maximizing the output of the discriminator, this novel objective forces the generator to produce data which can match the statistics of the real data. The statistics that are worth matching will be specified as the objective. One natural choice of statistics is mean value, matching the expected value of the features of the real data and that of generated data." Page 541 states "Given two data sets                         
                            X
                            =
                            
                                                    x
                                                
                                                    i
                                                
                                    i
                                    =
                                    1
                                
                                    N
                                
                    and                         
                            Y
                            =
                            
                                                    y
                                                
                                                    i
                                                
                                    j
                                    =
                                    1
                                
                                    M
                                
                    , we wish to consider the question of proving whether the generating distributions are the same, i.e., PX = PY." The real data (                        
                            X
                        
                    ) is interpreted as the fully-observed data and the generated data (                        
                            Y
                            )
                        
                     is interpreted as the imputed data. Page 541 states "Here, we incorporate both the first- and second-order moment of every embedding dimensionality by using the linear kernel to match the mean and variance of features, named maximum mean and variance discrepancy. The mean squared difference of the mean and variance of the two sets of samples are used to train the generator. Letting φ(x) denote activations on an intermediate layer of the discriminator, our new objective for the generator is defined as                         
                            L
                            =
                            
                                            E
                                            
                                                    ϕ
                                                    
                                                                    x
                                                                
                                                                    i
                                                                
                                            -
                                            E
                                            
                                                    ϕ
                                                    
                                                                    y
                                                                
                                                                    j
                                                                
                                    q
                                
                            +
                            λ
                            
                                            Var
                                            
                                                    ϕ
                                                    
                                                                    x
                                                                
                                                                    i
                                                                
                                            -
                                            Var(ϕ(
                                            
                                                    y
                                                
                                                    j
                                                
                                            )
                                        
                                    q
                                
                     where                         
                            
                                            ⋅
                                        
                                    q
                                
                     denotes the                         
                            
                                    l
                                
                                    q
                                
                    -norm and                         
                            E
                            
                                    ⋅
                                
                    and                         
                            Var
                            
                                    ⋅
                                
                     denote the means and variances of the feature embeddings of data." One of ordinary skill in the art would realize that a loss function is minimized during training. As the mean is                         
                            E
                            
                                    ϕ
                                    
                                            ⋅
                                        
                    , one of ordinary skill in the art, from equation 2 on page 541, would realize that the mean of the dataset                         
                            X
                        
                    , interpreted as                         
                            
                                    D
                                
                                    1
                                
                    , is calculated by                         
                            
                                    1
                                
                                    N
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                        N
                                    
                                    ϕ
                                    (
                                    
                                            x
                                        
                                            i
                                        
                                    )
                                
                     and the mean of the data set                         
                            Y
                        
                    , interpreted as                         
                            
                                    D
                                
                                    2
                                
                    , is calculated by                         
                            
                                    1
                                
                                    M
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        M
                                    
                                    ϕ
                                    (
                                    
                                            y
                                        
                                            i
                                        
                                    )
                                
                    .                         
                            N
                        
                     is interpreted as                         
                            N
                             
                    in the claims, and                         
                            
                                    x
                                
                                    i
                                
                     is interpreted as                         
                            
                                                    D
                                                
                                                ~
                                            
                                                    j
                                                
                                                    t
                                                    r
                                                    a
                                                    i
                                                    n
                                                
                                            n
                                            ,
                                            1
                                            :
                                            
                                                    T
                                                
                                                    n
                                                
                     and                         
                            
                                    y
                                
                                    i
                                
                     is interpreted as                         
                            
                                                    D
                                                
                                                ^
                                            
                                                    j
                                                
                                    n
                                    ,
                                    1
                                    :
                                    
                                            T
                                        
                                            n
                                        
                     As stated on page 541,                         
                            ϕ
                            (
                            ⋅
                            )
                        
                     matches sample                         
                            ⋅
                        
                     to a feature space, the features of the sample are embedded, meaning that, for all samples, the sum of the features in the sample (                        
                            
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            x
                                        
                                            i
                                        
                            )
                        
                     and  (                        
                            
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                        f
                                    
                                            y
                                        
                                            i
                                        
                            )
                        
                     is simply                         
                            ϕ
                            (
                            
                                    x
                                
                                    i
                                
                            )
                        
                     and                         
                            ϕ
                            
                                            y
                                        
                                            i
                                        
                     respectively.                          
                            I
                        
                     is interpreted as all numbers meaning that                         
                            ∈
                            
                                    I
                                
                                            f
                                        
                     is inherent.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo, Zhu, Dai, Koochali, Weng, and Tipirneni with the teachings of Hu because, as Hu states on page 541, "In practice, mean statistic cannot well describe the discrepancy of different distributions. Adding second-order information would enrich the discrimination power of the feature space."
 
Regarding claim 2, the rejection of claim 1 is incorporated herein. Guo does not appear to explicitly teach
wherein the cluster dependent random noise is obtained from the cluster independent random noise that is sampled from a Gaussian distribution and the one or more associated cluster labels corresponding to the input training dataset. 
However, Dai—directed to analogous art—teaches 
the cluster independent random noise that is sampled from a Gaussian distribution (Page 794 states "each element of                         
                            z
                        
                     is independently drawn from Gaussian noise                         
                            N
                            (
                            0,1
                            )
                        
                    .”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo and Dai because of the reasons given above in regards to claim 1.
The combination of Guo and Dai does not appear to explicitly teach
wherein the cluster dependent random noise is obtained from … and the one or more associated cluster labels corresponding to the input training dataset. 
However, Zhu—directed to analogous art—teaches
wherein the cluster dependent random noise is obtained from … and the one or more associated cluster labels corresponding to the input training dataset. (Page 5 states "Theorem 2 effectively divides the instance-dependent la bel noise into two parts. As shown in Eq. (2), the first line is the same as Eq. (1) in Lemma 1, indicating the average effect of instance-dependent label noise can be treated as a class-dependent one with parameters                         
                            
                                    e
                                
                                    +
                                
                            ,
                             
                                    e
                                
                                    -
                                
                    ." The labels are used in the Eq. 2, so the cluster dependent random noise is found using the labels.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo and Dai with the teachings of Zhu for the reasons given above in regards to claim 1.

Regarding claim 3, the rejection of claim 1 is incorporated herein. Guo does not appear to explicitly teach
wherein the flipped mask variable is obtained based on a difference between a pre-defined value and the mask variable. 
However, Dai—directed to analogous art—teaches
wherein the flipped mask variable is obtained based on a difference between a pre-defined value and the mask variable. (Section IIA on pages 792-793 states "Here the basic idea is to replace the values of in the covariate set mis(                        
                            k
                        
                    ) with a random noise, then feed this partially-true noisy data into the generator to obtain a high-quality imputation. Specifically, for MI-GAN1, the generator entails two steps:                          
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                                    x
                                    ,
                                    z
                                    ,
                                    
                                            m
                                        
                                            k
                                        
                            =
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                            (
                            x
                             
                                ⨀
                                
                                            m
                                        
                                            k
                                        
                            +
                             
                            z
                             
                                ⨀
                                
                                            1
                                            -
                                            
                                                    m
                                                
                                                    k
                                                
                            )
                        
                     is vector of length                         
                            p
                        
                    , where                         
                            ⨀
                        
                     denotes the element-wise multiplication and                         
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                     is the generator network that outputs a value for every covariate, even its value was observed.”                         
                            
                                            G
                                        
                                        ^
                                    
                                    k
                                
                            (
                            x
                             
                                ⨀
                                
                                            m
                                        
                                            k
                                        
                            +
                             
                            z
                             
                                ⨀
                                
                                            1
                                            -
                                            
                                                    m
                                                
                                                    k
                                                
                            )
                        
                     is interpreted as the imputed synthetic noise, as this is the noisy input to the generator.                         
                            
                                    m
                                
                                    k
                                
                     is the mask variable and                         
                            (
                            1
                            -
                            
                                    m
                                
                                    k
                                
                            )
                        
                     is the flipped mask variable, as                         
                            
                                    m
                                
                                    k
                                
                     is represented as ones and zeroes and                         
                            1
                            -
                            1
                            =
                            0
                        
                     and                         
                            1
                            -
                            0
                            =
                            1
                        
                    , resulting in a flipped mask when the matrix is subtracted from 1. 1 is the pre-defined value.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo and Dai because of the reasons given above in regards to claim 1.

Regarding claim 5, the rejection of claim 1 is incorporated herein. Guo teaches
further comprising validating the imputed training data based on a comparison of the imputed training data and the input training dataset.  (Page 193 states "In the evaluation stage, we use the error defined by (12) as the measure to evaluate the performance of imputation:". The equation uses the difference between the test data set, taken from the training data set, and the imputed data, meaning that they are compared. Evaluating is interpreted as validating.)

Regarding claim 7, Guo teaches
A system, comprising: a memory storing instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory via the one or more communication interfaces, wherein the one or more hardware processors are configured by the instructions to: (Page 194 states "In this paper, all of the networks in the experiment are implemented based on Tensorflow [57] framework." Tensorflow is a programming framework implemented on a computer using a processor, which executes program instructions from a memory that it is coupled to. As this experiment was created by a person, a communication interface, such as a monitor, must have been present to enable the person to communicate with the processor and memory. One of ordinary skill in the art would realize that implementing networks in Tensorflow means writing program instructions in code.)
… by using a generator module comprised in the cpGAN … (The part of the code that implements the generator part of the GAN is interpreted as the generator module.)
	… by using a critic module comprised in the cpGAN … (The part of the code that implements the target feature embeddings is interpreted as the critic module.)
	… by using a supervisor module comprised in the cpGAN … (The part of the code that implements the imputed feature embeddings is interpreted as the critic module.)
… by using a recovery module comprised in the cpGAN … (The part of the code that implements the imputed training data is interpreted as the critic module.)
The remainder of claim 7 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

Claims 8, 9, and 11 recite substantially similar subject matter to claims 2, 3, and 5 respectively and are rejected with the same rationale, mutatis mutandis.

Regarding claim 13, Guo teaches
One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause: (Page 194 states "In this paper, all of the networks in the experiment are implemented based on Tensorflow [57] framework." Tensorflow is a programming framework implemented on a computer using a processor, which executes program instructions from a memory that it is coupled to. As this experiment was created by a person, a communication interface, such as a monitor, must have been present to enable the person to communicate with the processor and memory. One of ordinary skill in the art would realize that implementing networks in Tensorflow means writing program instructions in code.)
The remainder of claim 13 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.

	Claims 14, 15, and 17 recite substantially similar subject matter to claims 2, 3, and 5 respectively and are rejected with the same rationale, mutatis mutandis.

Claim(s) 6, 12, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo (“A data imputation method for multivariate time series based on generative adversarial network”, 2019), Zhu (“A Second-Order Approach to Learning with Instance-Dependent Label Noise”, 2021), Dai (“Multiple Imputation via Generative Adversarial Network for High-dimensional Blockwise Missing Value Problems”, 2021), Koochali (“Probabalistic Forecasting of Sensory Data With Generative Adversarial Networks – ForGAN”, 2019), Weng (“Contrastive Representation Learning”, 2021), Tipirneni (“Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series”, February 2022), and Hu (“Semi-Supervised Learning Based on GAN With Mean and Variance Feature Matching”, 2019) as applied to claim 1 above, and further in view of Miao (“Generative Semi-Supervised Learning for Multivariate Time Series Imputation”, 2021).

Regarding claim 6, the rejection of claim 1 is incorporated herein. Guo teaches
the one or more imputed high-dimensional feature embeddings (As can be seen in Fig. 4, the training data set input into the discriminator, which, as one of ordinary skill in the art would understand, learns to classifies the embeddings into real or fake, with the loss function discouraging “fake” embeddings. This makes the feature embeddings predicted in the next iteration of the generator closer to the training dataset. Therefore, target feature embeddings of the training dataset are predicted using the feature embeddings. These are also for imputation as stated by Section 4.1, page 192 above. This step is also done while training.)
Guo does not appear to explicitly teach
further comprising classifying [the imputed data representations] into at least one class type. 
However, Miao—directed to analogous art—teaches
further comprising classifying [the imputed data representations] into at least one class type. (Page 8986 states "Specifically, during training SSGAN, the classifier predicts the labels for a fraction of unlabeled time series samples with largest predicted confidences. It then trains with the labeled time series data imputed by the generator. It drives the generator to pay more attention to the time series sharing the same label Z, when it is imputing an incomplete time series with a label Z." Predicting lables is classifying the imputed data representations.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Guo, Zhu, Dai, Koochali, Weng, Tipirneni, and Hu with the teachings of Miao because as Miao states on page 8983, "Accordingly, it drives to integrate the time series imputation and subsequent analysis. It is also promising and practical to make full use of valuable annotated labels in time series data for missing value imputation in real-life applications. However, almost all existing time series imputation approaches (Fortuin et al. 2019; Liu et al. 2019; Luo et al. 2018, 2019; Ma et al. 2019) do not consider the downstream analysis when imputing the missing values in time series data with a fraction of labels."
Claims 12 and 18 recite substantially similar subject matter to claim 6 and are rejected with the same rationale, mutatis mutandis.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.T.P./Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121
Read full office action
Prosecution Timeline

Jul 27, 2022
Application Filed
Jun 23, 2025
Non-Final Rejection — §101, §103, §112
Oct 01, 2025
Response Filed
Jan 07, 2026
Final Rejection — §101, §103, §112 (current)
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

3-4
Expected OA Rounds
33%
Grant Probability
0%
With Interview (-33.3%)
3y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 3 resolved cases by this examiner. Grant probability derived from career allow rate.
CONGENIALITY-PRESERVING GENERATIVE ADVERSARIAL NETWORKS FOR IMPUTING LOW-DIMENSIONAL MULTIVARIATE TIME-SERIES DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

CONGENIALITY-PRESERVING GENERATIVE ADVERSARIAL NETWORKS FOR IMPUTING LOW-DIMENSIONAL MULTIVARIATE TIME-SERIES DATA

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email