Office Action Analysis: 18129540 — AUTOENCODER WITH GENERATIVE ADVERSARIAL NETWORKS FOR TRANSFER LEARNING BETWEEN DOMAINS

Examiner Intelligence

COLEMAN, PAUL View full profile →
Grants 70% — above average
Career Allow Rate
7 granted / 10 resolved
+15.0% vs TC avg
Strong +43% interview lift
Without
With
+42.9%
Interview Lift
resolved cases with interview
Typical timeline
3y 6m
Avg Prosecution
23 currently pending
Career history
33
Total Applications
across all art units
Statute-Specific Performance

§101
36.3%
-3.7% vs TC avg
§103
42.0%
+2.0% vs TC avg
§102
6.2%
-33.8% vs TC avg
§112
12.4%
-27.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 10 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/31/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
The following claim terms invoke 35 U.S.C. § 112(f) because they are recited primarily in functional “configured to …” form (i.e., as black-box functional models), without reciting sufficient structure for performing the claimed functions:
“generator” (claim 1; further recited in claims 3, 6, 9, 13)
“encoder” (claim 1; further recited in claims 3, 4, 11-13)
“decoder” (claim 1; further recited in claims 3, 10, 13) 
“discriminator” (claim 1; further recited in claims 8, 12, 13)
“classifier” (claim 1; further recited in claims 7, 12, 13)
Accordingly, each of the above limitations is treated as a separate § 112(f) element (i.e., “[module] for performing [recited function]”), and the claim scope is construed to cover the corresponding structure, material or acts described in the specification and equivalents thereof. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim limitations:
 “a generator configured to generate a domain-independent representation of an input data sample;” and further “configured to generate” such representation such that it fools the discriminator, enables classification, and enables reconstruction;  
“an encoder configured to generate a domain-dependent representation” (and dependent claims further recite constraints/penalties on the encoder/representation);
“a decoder configured to ensure that a combination” of the representations contains sufficient information to reconstruct the input;
“a discriminator configured to attempt to determine an originating domain” of the domain-independent representation; and
“a classifier configured to classify” based on the domain-independent representation;  
invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. the specification is unclear as to whether the 'generator', 'encoder', 'decoder', 'discriminator', and 'classifier' are intended as software modules, hardware modules, or both, and does not clearly identify the corresponding structure, material, or acts to the function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Regarding Dependent Claims
Claims 2-15 depend from claim 1 and therefore incorporate the indefinite subject matter of claim 1. Accordingly, claims 1-15 are likewise rejected under 35 U.S.C. § 112(b) for the same reasons set forth above. See MPEP § 2173 (dependent claims standing or falling with an indefinite base claim).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 for containing an abstract idea without significantly more.

Regarding claim 1
Claim 1 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 1 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
configured to generate a domain-independent representation of an input data sample; - this limitation is recited only functionally, i.e., generating a “domain-independent representation” of an input data sample. In context with the specification, this is a neural network that computes a latent vector from input x; conceptually, a mathematical function                     
                        g
                        (
                        x
                        )
                    
                 on abstract data. Thus, this limitation is part of the mathematical concept of computing abstract latent variables from input data. See MPEP § 2106.04(a)(2)(I). 
configured to generate a domain-dependent representation of the input data sample; - this limitation simply maps the input data sample to a “domain-dependent representation” (another latent vector). This is an abstract transformation                     
                        e
                        (
                        x
                        )
                    
                 on data, specified only at the level of functionality. This is part of the same abstract scheme of encoding input data into multiple latent components. See MPEP § 2106.04(a)(2)(I).
configured to ensure that a combination of the domain-independent representation and the domain-dependent representation contains sufficient information to reconstruct the input data sample; - this limitation is a function                     
                        f
                        
                                ∙
                            
                 operating on the latent pair (domain-independent, domain-dependent) to produce a reconstructed sample                     
                        
                                x
                            
                            ^
                        
                . The requirement that the combination “contains sufficient information to reconstruct the input data sample” is a statement about information content and invertibility of the mappings, an information-theoretical/mathematical relationship between                     
                        x
                        ,
                         
                        D
                        I
                        R
                        e
                        p
                        ,
                         
                and                    
                         
                        D
                        D
                        R
                        e
                        p
                    
                . This is still within the abstract ML/representation-learning framework: define latent codes and a decoder such that                     
                        x
                         
                        ≈
                         
                        f
                        (
                        D
                        I
                        R
                        e
                        p
                        ,
                         
                        D
                        D
                        R
                        e
                        p
                        )
                    
                . See MPEP § 2106.04(a)(2)(I). 
configured to attempt to determine an originating domain of the domain- independent representation; - this limitation recites a function that takes as input the domain-independent representation and outputs a domain label prediction, i.e., a function                     
                        d
                        (
                        D
                        I
                        R
                        e
                        p
                        )
                         
                        d
                        o
                        m
                        a
                        i
                        n
                    
                . This is an abstract classifier operating purely on non-physical latent vectors, implementing mathematical decision rules and loss functions. See MPEP § 2106.04(a)(2)(I). 
configured to classify the input data sample based on the domain-independent representation of the input data sample; - this limitation is another function                     
                        c
                        (
                        D
                        I
                        R
                        e
                        p
                        )
                         
                        l
                        a
                        b
                        e
                        l
                    
                , producing classification outputs from the domain-independent representation. Again, this is a standard mathematical decision function over abstract vectors, with no recited particular hardware or external control. Thus, this limitation is part of the abstract mathematical model for learning and using                     
                        D
                        I
                        R
                        e
                        p
                    
                 for classification. See MPEP § 2106.04(a)(2)(I). 
Claim 1 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements:
“a generator”, “an encoder”, “a decoder”, “a discriminator”, “and a classifier” – these additional elements merely describe a generic machine-learning model architecture for performing the abstract mathematical operations above. They are recited at a high level of generality and amount to generic computing components used to perform the judicial exception, without (i) any particular improvement to computer functionality, (ii) any particular machine implementation or specialized hardware configuration, or (iii) any meaningful limitation that applies the recited mathematics in a way that imposes a concrete technological implementation beyond the abstract model itself. Instead, the claim broadly uses generic ML components to process and optimize abstract representations (domain-independent/domain-dependent latent vectors) and to produce outputs (classification/domain label/reconstruction). This is the type of recitation that amounts to use of generic computing components to perform the abstract idea, which does not integrate the exception into a practical application (see MPEP § 2106.05(f) (“generic computer implementation” / “apply it on a computer”)).
is configured to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables the decoder to reconstruct the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content – this limitation does not integrate the judicial exception into a practical application, rather, it merely recites optimization objectives / training criteria (adversarial “fooling”, classification-enabling, reconstruction-enabling, and an information-context constraint) that govern how the abstract mathematical model is trained or evaluated. Such recited training goals are ancillary to the core abstract idea of generating and using latent representations via mathematical operations and do not impose any meaningful technological implementation, particular machine, or application to a specific technological process. Accordingly, this limitation amounts to insignificant extra-solution activity as it describes a desired result of model training and/or scoring, rather than a concrete technological application or improvement (see MPEP § 2106.05(g) (insignificant extra-solution activity)); does not integrate the judicial exception into a practical application. 
Claim 1 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements:
“a generator”, “an encoder”, “a decoder”, “a discriminator”, “and a classifier” – these additional elements and the recited training objectives/constraints merely (i) apply conventional ML building blocks to perform encoding/decoding/classification/domain discrimination on abstract feature vectors; and (ii) recite results-oriented functional language (e.g., “fools the discriminator”, “enables classification”, “enables reconstruction”, “low information content”) without requiring any specific technical mechanism that improves computer operation. These limitations amount to well-understood, routine, and conventional (WURC) extra-solution activity for implementing and training an abstract mathematical model on generic computing hardware, and therefore do not provide an inventive concept or integration into a practical application (see MPEP § 2106.05(d)) 
is configured to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables the decoder to reconstruct the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content – this limitation fails to provide “significantly more” because it merely adds well-understood, routine, and conventional (WURC) training objectives to the abstract mathematical operations. Specifically, configuring so as to (i) fool a discriminator, (ii) enable classification, (iii) enable reconstruction, and (iv) constrain a representation to low information content, amounts to specifying loss-driven optimization targets for the underlying mathematical model (i.e., adversarial objective, classification objective, reconstruction objective, and regularization/information constraint). These are results-oriented statements of intended model behavior and do not recite any particular technical mechanism, specialized computing structure, or technological improvement to computer functionality. See MPEP § 2106.05(d).

Regarding claim 2
Claim 2 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 2 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the domain-dependent representation is constrained to have low information content relative to the domain-independent representation. – this limitation narrows the information content relationship between those latent representations/variables (e.g.,                     
                        l
                        (
                        D
                        D
                        R
                        e
                        p
                        ;
                        X
                        )
                         
                        <
                         
                        l
                        (
                        D
                        I
                        R
                        e
                        p
                        ;
                        X
                        )
                    
                , where                     
                        D
                        D
                        R
                        e
                        p
                    
                 is the random variable for the domain-dependent representation;                     
                        D
                        I
                        R
                        e
                        p
                    
                 is the random variable for the domain-independent representation; ‘                    
                        l
                    
                ’ is the label of the sample; and ‘                    
                        X
                    
                ’ is the distribution of input samples                     
                        x
                    
                ). This is an additional information-theory/mathematical constraint inside the same abstract model, and as such, falls within the mathematical concepts grouping (mathematical relationships, formulas, and calculations). See MPEP § 2106.04(a)(2)(I). 
Claim 2 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 2 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 
	
Regarding claim 3
Claim 3 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 3 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea. Claim 3 depends from claim 1 which was found to recite an abstract idea (see rejection of claim 1).
Claim 3 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements:
wherein the generator receives generator input information related to a first domain and a second target domain and transforms the generator input information into the domain-independent representation of common elements of the first domain and the second target domain; - this limitation merely specifies the source of input data (information from a first domain and a second target domain) that the generator transforms into latent representations (DIRep). This amounts to insignificant extra-solution activity under MPEP § 2106.05(g). There is no recitation of any technological implementation beyond this training/data handling and thus, no meaningful limit on the judicial exception or integration into a practical application. 
wherein the encoder receives the generator input information related to the first domain and the second target domain and transforms the generator input information into the domain-dependent representation, wherein the domain-dependent representation is a representation of elements to be reproduced; - this limitation merely specifies that the encoder transforms the input into latent representations (DDRep). This amounts to insignificant extra-solution activity under MPEP § 2106.05(g). There is no recitation of any technological implementation beyond this training/data handling and thus, no meaningful limit on the judicial exception or integration into a practical application. 
and wherein the decoder receives as inputs the domain independent representation and the domain dependent representation, the output of the decoder being used to train the encoder and the generator so the domain-independent representation is able to reproduce predictions that match an original first domain. – this limitation merely specifies that the decoder output is used in training to encourage DIRep to reproduce predictions matching the original source domain. This is insignificant extra-solution activity under MPEP § 2106.05(g) because it amounts to using the model’s output to adjust/learn parameters (training), which is ancillary to the underlying abstract idea of mathematical operations on representations. There is no recitation of any technological implementation beyond this training/data handling and thus, no meaningful limit on the judicial exception or integration into a practical application. 
Claim 3 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements:
wherein the generator receives generator input information related to a first domain and a second target domain and transforms the generator input information into the domain-independent representation of common elements of the first domain and the second target domain; - this limitation merely adds well-understood, routine, and conventional (WURC) training/data-flow details to the abstract mathematical model of claim 1 (e.g., providing training inputs from two domains). There is no recited element(s) that amount to “significantly more” than the judicial exception. See MPEP § 2106.05(d).
wherein the encoder receives the generator input information related to the first domain and the second target domain and transforms the generator input information into the domain-dependent representation, wherein the domain-dependent representation is a representation of elements to be reproduced; - this limitation merely adds well-understood, routine, and conventional (WURC) training/data-flow details to the abstract mathematical model of claim 1 (e.g., providing training inputs from two domains). This is a conventional and routine aspect of training machine-learning models (data selection, representation learning, reconstruction). There is no recited element(s) that amount to “significantly more” than the judicial exception. See MPEP § 2106.05(d).
and wherein the decoder receives as inputs the domain independent representation and the domain dependent representation, the output of the decoder being used to train the encoder and the generator so the domain-independent representation is able to reproduce predictions that match an original first domain. – this limitation merely adds well-understood, routine, and conventional (WURC) training/data-flow details to the abstract mathematical model of claim 1, namely: using decoder output as a training signal to fit the model to match source-domain predictions. This is a conventional and routine aspect of training machine-learning models (data selection, representation learning, reconstruction). There is no recited element(s) that amount to “significantly more” than the judicial exception. See MPEP § 2106.05(d).

Regarding claim 4
Claim 4 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 4 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea. Claim 4 depends from claim 1 which was found to recite an abstract idea (see rejection of claim 1).
Claim 4 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements: 
wherein the encoder is configured to be penalized during training based on information content of the domain-dependent representation, such that an amount of information is increased in the domain-independent representation and an amount of information is decreased in the domain-dependent representation. – this limitation is directed to a training objective/regularization condition, i.e., applying an information-content penalty to the encoder during training to shift information between latent representations. It merely instructs that, during training, the encoder be penalized based on “information content” of the domain-dependent representation, with the intended effect of redistributing information between the two latent representations. This is a mere instruction to apply the underlying abstract mathematical concept in the training process, rather than a technical implementation that meaningfully limits the exception. See MPEP § 2106.05(f).
Claim 4 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements: 
wherein the encoder is configured to be penalized during training based on information content of the domain-dependent representation, such that an amount of information is increased in the domain-independent representation and an amount of information is decreased in the domain-dependent representation. – this limitation simply adds a direction to apply an information-content penalty during training to encourage a desired allocation of information between latent variables. That is an abstract objective/regularizer layered onto the same abstract model, and it does not recite a nonconventional technical mechanism or an unconventional implementation. Instead, the limitation recites well-understood, routine, and conventional (WURC) activity. See MPEP § 2106.05(d). 

Regarding claim 5
Claim 5 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 5 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the content of the domain dependent representation is constrained to be dependent only on an identifier of an originating domain of the input data sample. – the recited “identifier of an originating domain” is itself just abstract symbolic data (a domain label or bit). Constraining the DDRep to depend only on the identifier is specifying an internal information partitioning rule, i.e., DDRep carries domain identity, while DIRep must carry everything else. This is a design of how to encode information in latent variables, an information-theoretic/mathematical relationship, implemented via architecture and/or loss functions. No concrete hardware, physical transformation, or external technical system is introduced. Accordingly, this limitation is part of the same abstract mathematical concept as claim 1. See MPEP § 2106.04(a)(2)(I). 
Claim 5 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 5 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 6
Claim 6 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 6 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
Claim 6 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements:
wherein the generator comprises a first generator for an input data sample of a source domain and a second generator for an input data sample a target domain. –this limitation merely designates that the abstract mapping is implemented in two separate generator components (source vs. target), which is an internal implementation choice within the same abstract model and does not impose a meaningful technological constraint or a particular machine implementation. Rather it amounts to a generic computing component / mere instruction to apply the abstract idea using conventional functional modules. See MPEP § 2106.05(f). 
Claim 6 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements: 
wherein the generator comprises a first generator for an input data sample of a source domain and a second generator for an input data sample a target domain. – this limitation merely provides that the “generator” is implemented as two generator components associated with source-domain and target-domain inputs. This is well-understood, routine, and conventional (WURC) activity in the art and it does not recite a nonconventional technical mechanism or an unconventional computer implementation; it remains an abstract organization of the same underlying model and training concept. See MPEP § 2106.05(d). 

Regarding claim 7
Claim 7 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 7 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein a loss function for the classifier is:                     
                        
                                L
                            
                                c
                            
                        =
                        
                                L
                            
                                c
                            
                                        l
                                    
                                    ^
                                
                                ,
                                 
                                l
                            
                        =
                        
                                L
                            
                                c
                            
                        (
                        C
                        
                                G
                                
                                        x
                                    
                                ,
                                 
                                l
                            
                ; - in the specification, the classification loss is described as a function that measures the difference between the classifier’s prediction and the true label with a more explicit source-domain version given as a cross-entropy sum over labeled samples. This limitation therefore defines an explicit mathematical formula (the classifier loss) over (i) the predicted label                     
                        
                                l
                            
                            ^
                        
                        =
                        C
                        (
                        G
                        
                                x
                            
                        )
                    
                 and (ii) the true label                     
                        l
                    
                , and is used as part of the optimization objective for training the classifier neural network. It is, by its nature, a mathematical relationship / calculation on abstract data (labels, predictions, and latent representations). It specifies how the model is trained, not any particular hardware structure, physical transformation, or concrete external technical use. See MPEP § 2106.04(a)(2)(I). 
Claim 7 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 7 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 8
Claim 8 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 8 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein a loss function for the discriminator is:                     
                        
                                L
                            
                                d
                            
                        =
                        
                                L
                            
                                d
                            
                                        d
                                    
                                    ^
                                
                                ,
                                 
                                d
                            
                        =
                        
                                L
                            
                                d
                            
                        (
                        D
                        
                                G
                                
                                        x
                                    
                                ,
                                 
                                d
                            
                . – in the specification, the discriminator loss is explicitly defined as a function that measures how well the discriminator predicts whether the domain-independent representation comes from the source or target domain, for example via a cross-entropy expression over the true domain labels and discriminator outputs. The limitation therefore specifies an explicit mathematical loss function for the discriminator, operating on abstract data (i) the predicted domain label                     
                        
                                d
                            
                            ^
                        
                        =
                        D
                        (
                        G
                        
                                x
                            
                        )
                    
                , and (ii) the true domain label                     
                        d
                    
                , and is used as part of the optimization objective for training the discriminator neural network. It is, by its nature, a mathematical relationship / calculation on abstract symbols and real-valued outputs. It defines how the model is trained, and does not recite any particular hardware structure, any physical article/transformation, or any concrete external technical use. See MPEP § 2106.04(a)(2)(I).
Claim 8 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 8 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 9
Claim 9 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 9 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the generator has a smaller loss when the discriminator makes a wrong prediction and a loss function for the generator is:                     
                        
                                L
                            
                                g
                            
                        =
                        
                                L
                            
                                g
                            
                                        d
                                    
                                    ^
                                
                                ,
                                 
                                1
                                -
                                d
                            
                        =
                        
                                L
                            
                                d
                            
                        (
                        D
                        
                                G
                                
                                        x
                                    
                                ,
                                 
                                1
                                -
                                d
                            
                . – in the specification, the generator loss is described as a GAN-style loss with inverted domain labels: the generator’s loss is smaller when the discriminator predicts the wrong domain (i.e., when the domain-independent representation “fools” the discriminator). This is formalized as a loss function                      
                        
                                L
                            
                                g
                            
                 defined over the discriminator output                     
                        D
                        (
                        G
                        
                                x
                            
                        )
                    
                 and inverted domain labels, implemented e.g., via cross-entropy. This limitation therefore (i) defines an explicit mathematical loss function for the generator, and (ii) specifies its behavior (“similar loss when the discriminator makes a wrong prediction”). It operates on abstract data, e.g., the discriminator output                     
                        
                                d
                            
                            ^
                        
                        =
                        D
                        (
                        G
                        
                                x
                            
                        )
                    
                , and (inverted) domain labels, and is used as part of the optimization objective for training the generator neural network. By its nature, this is a mathematical relationship /calculation, i.e., a function used in gradient-based optimization of network parameters. It defines how the abstract ML model is trained, not any specific hardware structure, physical transformation, or concrete external technical use. See MPEP § 2106.04(a)(2)(I).
Claim 9 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 9 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 10
Claim 10 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 10 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein a reconstruction loss of the decoder is:                     
                        
                                L
                            
                                r
                            
                        =
                        
                                L
                            
                                r
                            
                                        x
                                    
                                    ^
                                
                                ,
                                 
                                x
                            
                        =
                        
                                L
                            
                                r
                            
                                F
                                
                                        G
                                        
                                                x
                                            
                                        ,
                                         
                                        E
                                        
                                                x
                                            
                                ,
                                 
                                x
                            
                        .
                    
                 – in the specification, the reconstruction loss is described as a mean-squared-error (L2-norm) loss between the reconstructed output                     
                        
                                x
                            
                            ^
                        
                        =
                        F
                        
                                G
                                
                                        x
                                    
                                ,
                                 
                                E
                                
                                        x
                                    
                 and the original input                     
                        x
                    
                , i.e., a function of the difference between predicted and true values for the input data. The limitation therefore (i) defines an explicit mathematical loss function for the decoder, (ii) applied to abstract data (input samples                     
                        x
                    
                , their reconstructions                     
                        
                                x
                            
                            ^
                        
                , and the outputs of the generator and encoder), and (iii) is used as part of the optimization objective in training the neural network. By its nature, this is a mathematical relationship /calculation on real-valued vectors, specifying how the abstract ML model is trained, not any particular hardware structure, physical transformation, or concrete external technical use. See MPEP § 2106.04(a)(2)(I).
Claim 10 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 10 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 11
Claim 11 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 11 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein a Kullback-Leibler Divergence loss for the encoder and the data-dependent representation is:                     
                        
                                L
                            
                                k
                                l
                            
                        =
                        
                                D
                            
                                K
                                L
                            
                        (
                        P
                        r
                        
                                E
                                
                                        x
                                    
                        ∥
                        N
                         
                                0
                                ,
                                 
                                1
                            
                        )
                        .
                    
                 – the specification explains that a KL-divergence loss is introduced for the encoder                     
                        E
                    
                 and DDRep to “create a minimal DDRep so most of the input information can be forced into the DIRep” (spec, p. 5, ¶[0049]), and gives the explicit form of the KL loss as a divergence between the distribution of DDRep and a standard normal, including a closed-form expression. This limitation therefore (i) defines an explicit mathematical loss function (Kullback-Leibler divergence), (ii) applied to abstract data, i.e., the distribution of the encoder output                     
                        E
                        (
                        x
                        )
                    
                 (DDRep), and the reference distribution                     
                        N
                         
                                0
                                ,
                                 
                                1
                            
                , and (iii) is used as part of the optimization objective for training the encoder neural network. By its nature, this is a mathematical relationship /calculation, i.e., a probability-divergence measure used to regularize a latent distribution. It specifies how the abstract ML model is trained and constructed, not any particular hardware structure, physical transformation, or concrete external technical use. See MPEP § 2106.04(a)(2)(I).
Claim 11 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 11 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 12
Claim 12 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 12 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the encoder is configured to use a L2-norm loss for a reconstruction loss and the discriminator is configured with a discriminator loss, the generator is configured with a generator loss, and the classifier is configured with a classifier loss, wherein the discriminator loss, the generator loss, and the classifier loss are based on cross entropy. – the specification explains that:
the reconstruction loss uses an L2-norm (mean squared error) between                     
                        x
                    
                 and                     
                        
                                x
                            
                            ^
                        
                        =
                        F
                        (
                        G
                        
                                x
                            
                        ,
                         
                        E
                        
                                x
                            
                        )
                    
                ; and 
the discriminator, generator, and classifier losses are implemented as cross-entropy losses on their respective outputs and labels.
Thus, claim 12’s added limitation specifies particular mathematical forms of loss functions (L2 norm and cross-entropy) used in training the neural networks, and concerns only how those abstract loss values are computed from predictions and labels. This is, by nature, a collection of mathematical relationship /calculation on abstract data (vectors             
                x
                ,
                 
                        x
                    
                    ^
                
        , labels             
                l
                ,
                 
                d
            
        , and network outputs). It specifies how the abstract ML model is trained and constructed, not any particular hardware structure, physical transformation, or concrete external technical use. See MPEP § 2106.04(a)(2)(I).
Claim 12 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 12 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 13
Claim 13 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 13 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein a gradient-descent based learning dynamic for the generator is based on:                     
                        ∆
                        G
                        =
                        -
                        
                                α
                            
                                g
                            
                        (
                        λ
                        
                                ∂
                                
                                        L
                                    
                                        g
                                    
                                ∂
                                G
                            
                        +
                        β
                        
                                ∂
                                
                                        L
                                    
                                        c
                                    
                                ∂
                                G
                            
                        +
                        γ
                        
                                ∂
                                
                                        L
                                    
                                        r
                                    
                                ∂
                                G
                            
                        )
                    
                ; a gradient-descent based learning dynamic for the classifier is based on:                     
                        ∆
                        C
                        =
                        -
                        
                                α
                            
                                C
                            
                                ∂
                                
                                        L
                                    
                                        C
                                    
                                ∂
                                C
                            
                ; a gradient-descent based learning dynamic for the discriminator is based on:                     
                        ∆
                        D
                        =
                        -
                        
                                α
                            
                                D
                            
                                ∂
                                
                                        L
                                    
                                        D
                                    
                                ∂
                                D
                            
                ; a gradient-descent based learning dynamic for the encoder is based on:                     
                        ∆
                        E
                        =
                        -
                        
                                α
                            
                                E
                            
                        (
                        
                                ∂
                                
                                        L
                                    
                                        k
                                        l
                                    
                                ∂
                                E
                            
                        +
                        μ
                        
                                ∂
                                
                                        L
                                    
                                        r
                                    
                                ∂
                                E
                            
                        )
                    
                ; and a gradient-descent based learning dynamic for the decoder is based on:                     
                        ∆
                        F
                        =
                        -
                        
                                α
                            
                                F
                            
                                ∂
                                
                                        L
                                    
                                        r
                                    
                                ∂
                                F
                            
                ; wherein                     
                        
                                α
                            
                                C
                                ,
                                 
                                D
                                ,
                                 
                                E
                                ,
                                 
                                F
                                ,
                                 
                                G
                            
                 are learning rates,                     
                        
                                L
                            
                                d
                            
                 is a discriminator loss,                     
                        
                                L
                            
                                g
                            
                 is a generator loss, and                     
                        
                                L
                            
                                c
                            
                 is a classifier loss. – the specification explains these same equations as the gradient-descent based learning dynamics for updating the neural networks, with                     
                        α
                        ,
                         
                        β
                        ,
                         
                        γ
                        ,
                         
                        μ
                    
                 as hyperparameters controlling the relative weights of the different loss terms. Thus, the added limitations explicitly defines gradient-descent update equations (                    
                        ∆
                        G
                        ,
                         
                        ∆
                        C
                        ,
                         
                        ∆
                        D
                        ,
                         
                        ∆
                        E
                        ,
                         
                        ∆
                        F
                    
                ) in terms of partial derivatives of the loss functions, and specifies how these are combined with scaler learning rates and loss weights to update network parameters. These are, by their nature, mathematical relationships and calculations, i.e., concrete formulas for gradient-based optimization of the neural network parameters based on the previously defined loss functions. They operate entirely on (i) abstract objects (network weights G, C, D, E, F), and (ii) scalar loss values and gradients, and specify how the abstract ML model is trained, not any particular hardware structure, physical transformation, or concrete external technical use. See MPEP § 2106.04(a)(2)(I).  
Claim 13 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 13 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 14
Claim 14 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 14 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the apparatus is configured to classify data that evolves over time. – this limitation characterizes the type of data and use case (data that “evolves over time” or exhibits drift) for the same underlying classifier system. The act of classifying data over time, or in the presence of distributional shift, is still a form of data analysis / classification (a mental process / mathematical concept) implemented with the same neural network architecture already recited in claim 1. It does not change (i) the nature of the data (still abstract input samples and labels), or (ii) the underlying mathematical operations (representation learning, loss functions, gradient-based training). Thus, this limitation is an abstract classification/analysis of the same abstract classification model applied to evolving data that could reasonably be performed in the human mind or with pen-and-paper; not a recitation of any specific hardware, sensor, or physical process. It therefore remains part of the same abstract mathematical concept as claim 1 under MPEP § 2106.04(a)(2)(I), 2106.04(a)(2)(III).
Claim 14 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 14 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 15
Claim 15 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 15 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the domain-dependent representation is a label indicating an originating domain of the input data sample. – the specification describes this “explicit DDRep” variant where the domain-dependent representation (DDRep) is minimal and contains only the originating domain label, i.e., a domain bit or label indicating whether the sample is from the source or target domain. This limitation therefore (i) specifies that DDRep is just a symbolic domain label (e.g., a bit or one-hot code) for the input’s originating domain; and (ii) is an internal coding choice for a latent variable in the model (how domain information is represented). Thus the claim is a mathematical / informational representation of domain identity (abstract data about how the sample’s domain), not a recitation of specific hardware, a physical transformation, or a concrete external technical use. It refines how the abstract latent representation is defined inside the same neural network architecture of claim 1. See MPEP § 2106.04(a)(2)(I). 
Claim 15 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 15 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 16
Claim 16 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a manufacture.
Claim 16 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
generating a domain-independent representation of an input data sample; 
generating a domain-dependent representation of the input data sample; 
configuring a decoder to ensure that a combination of the domain-independent representation and the domain-dependent representation contains sufficient information to reconstruct the input data sample; 
configuring a discriminator to attempt to determine an originating domain of the domain- independent representation; 
configuring a classifier to classify the input data sample based on the domain-independent representation of the input data sample; 
These are all operations on abstract data using mathematical relationships and calculations (loss functions, representation mappings, constraints), which fall into the “mathematical concepts” grouping under MPEP § 2106.04(a)(2)(I). Accordingly, claim 16, through its recited program instructions, is directed to the same abstract mathematical concept as claim 1: an ML/NN-based latent representation and training framework for domain adaptation with constrained domain-dependent representations. See rejection of claim 1; MPEP § 2106.04(a).

Claim 16 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements: 
“one or more tangible computer-readable storage media”, “program instructions stored on at least one of the one or more tangible computer-readable storage media”, and “program instructions executable by a processor” – these additional elements recite a generic computer implementation of a standard, non-transitory computer-readable medium and generic program instructions executed by a processor. Merely implementing an abstract idea on a generic computer or as software stored on a generic storage medium does not integrate the exceptions into a practical application. These CPP elements are insignificant extra-solution activity / generic computer components, not a particular machine in a meaningful sense, not a transformation of an article, and not a specific application in a concrete technological process. See MPEP § 2106.04(d); 2106.05(a)-(h). 
and configuring a generator to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables reconstruction of the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content. – this limitation does not integrate the judicial exception into a practical application, rather, it merely recites optimization objectives / training criteria (adversarial “fooling”, classification-enabling, reconstruction-enabling, and an information-context constraint) that govern how the abstract mathematical model is trained or evaluated. Such recited training goals are ancillary to the core abstract idea of generating and using latent representations via mathematical operations and do not impose any meaningful technological implementation, particular machine, or application to a specific technological process. Accordingly, this limitation amounts to insignificant extra-solution activity as it describes a desired result of model training and/or scoring, rather than a concrete technological application or improvement (see MPEP § 2106.05(g) (insignificant extra-solution activity)); does not integrate the judicial exception into a practical application.

Claim 16 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements:
“one or more tangible computer-readable storage media”, “program instructions stored on at least one of the one or more tangible computer-readable storage media”, and “program instructions executable by a processor” – these additional elements (generic storage media and generic program instructions executed by a processor) are well-understood, routine, and conventional (WURC) components of standard computer systems. They simply state “do the abstract mathematical ML process on a conventional computer / storage medium”, which does not amount to “significantly more” than the abstract idea. Individually and in combination, the claim does provide an inventive concept. See MPEP § 2106.05(d). 
and configuring a generator to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables reconstruction of the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content. – this limitation fails to provide “significantly more” because it merely adds well-understood, routine, and conventional (WURC) training objectives to the abstract mathematical operations. Specifically, configuring so as to (i) fool a discriminator, (ii) enable classification, (iii) enable reconstruction, and (iv) constrain a representation to low information content, amounts to specifying loss-driven optimization targets for the underlying mathematical model (i.e., adversarial objective, classification objective, reconstruction objective, and regularization/information constraint). These are results-oriented statements of intended model behavior and do not recite any particular technical mechanism, specialized computing structure, or technological improvement to computer functionality. See MPEP § 2106.05(d).

Regarding claim 17
Claim 17 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a process.
Claim 17 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
generating a domain-independent representation of an input data sample; 
generating a domain-dependent representation of the input data sample; 
configuring a decoder to ensure that a combination of the domain-independent representation and the domain-dependent representation contains sufficient information to reconstruct the input data sample; 
configuring a discriminator to attempt to determine an originating domain of the domain- independent representation; 
configuring a classifier to classify the input data sample based on the domain-independent representation of the input data sample; 
and configuring a generator to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables a reconstruction of the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content.
These method steps are the method counterpart of the apparatus/CPP functionality already analyzed in claims 1 and 16; they describe:
neural network mappings (generator, encoder, decoder, discriminator, classifier), 
the creation of domain-independent and domain-dependent latent representations, and
constraints and behaviors (fooling the discriminator, enabling classification, enabling reconstruction, enforcing low information content in the domain-dependent representation). 
All of these are operations on abstract data (input samples, labels, domain identifiers, and latent vectors) implemented by mathematical relationships and calculations (loss functions, training dynamics, representation mappings). These are mathematical concepts, relationships, formulas, and calculations and as such, are directed to a judicial exception under MPEP § 2106.04(a)(2)(I). See rejection for claim 1; MPEP § 2106.04(a). 
Claim 17 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements:
and configuring a generator to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables reconstruction of the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content. – this limitation does not integrate the judicial exception into a practical application, rather, it merely recites optimization objectives / training criteria (adversarial “fooling”, classification-enabling, reconstruction-enabling, and an information-context constraint) that govern how the abstract mathematical model is trained or evaluated. Such recited training goals are ancillary to the core abstract idea of generating and using latent representations via mathematical operations and do not impose any meaningful technological implementation, particular machine, or application to a specific technological process. Accordingly, this limitation amounts to insignificant extra-solution activity as it describes a desired result of model training and/or scoring, rather than a concrete technological application or improvement (see MPEP § 2106.05(g) (insignificant extra-solution activity)); does not integrate the judicial exception into a practical application.
Claim 17 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements: 
and configuring a generator to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables reconstruction of the input sample from the domain-independent representation and the domain-dependent representation and wherein the domain-dependent representation is constrained to have low information content. – this limitation fails to provide “significantly more” because it merely adds well-understood, routine, and conventional (WURC) training objectives to the abstract mathematical operations. Specifically, configuring so as to (i) fool a discriminator, (ii) enable classification, (iii) enable reconstruction, and (iv) constrain a representation to low information content, amounts to specifying loss-driven optimization targets for the underlying mathematical model (i.e., adversarial objective, classification objective, reconstruction objective, and regularization/information constraint). These are results-oriented statements of intended model behavior and do not recite any particular technical mechanism, specialized computing structure, or technological improvement to computer functionality. See MPEP § 2106.05(d).

Regarding claim 18
Claim 18 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 18 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the domain-dependent representation is constrained to have low information content relative to the domain-independent representation. – this limitation narrows the information content relationship between those latent representations/variables (e.g.,                     
                        l
                        (
                        D
                        D
                        R
                        e
                        p
                        ;
                        X
                        )
                         
                        <
                         
                        l
                        (
                        D
                        I
                        R
                        e
                        p
                        ;
                        X
                        )
                    
                , where                     
                        D
                        D
                        R
                        e
                        p
                    
                 is the random variable for the domain-dependent representation;                     
                        D
                        I
                        R
                        e
                        p
                    
                 is the random variable for the domain-independent representation; ‘                    
                        l
                    
                ’ is the label of the sample; and ‘                    
                        X
                    
                ’ is the distribution of input samples                     
                        x
                    
                ). This is an additional information-theory/mathematical constraint inside the same abstract model, and as such, falls within the mathematical concepts grouping (mathematical relationships, formulas, and calculations). See MPEP § 2106.04(a)(2)(I).
Claim 18 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 18 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Regarding claim 19
Claim 19 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 19 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea. Claim 19 depends from claim 18 which was found to recite an abstract idea (see rejection of claim 18).
Claim 19 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. The additional elements: 
configuring an encoder to be penalized during training based on information content of the domain-dependent representation, such that an amount of information is increased in the domain-independent representation and an amount of information is decreased in the domain-dependent representation. – this limitation is directed to a training objective/regularization condition, i.e., applying an information-content penalty to the encoder during training to shift information between latent representations. It merely instructs that, during training, the encoder be penalized based on “information content” of the domain-dependent representation, with the intended effect of redistributing information between the two latent representations. This is a mere instruction to apply the underlying abstract mathematical concept in the training process, rather than a technical implementation that meaningfully limits the exception. See MPEP § 2106.05(f).
Claim 19 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. The additional elements: 
configuring an encoder to be penalized during training based on information content of the domain-dependent representation, such that an amount of information is increased in the domain-independent representation and an amount of information is decreased in the domain-dependent representation. – this limitation simply adds a direction to apply an information-content penalty during training to encourage a desired allocation of information between latent variables. That is an abstract objective/regularizer layered onto the same abstract model, and it does not recite a nonconventional technical mechanism or an unconventional implementation. Instead, the limitation recites well-understood, routine, and conventional (WURC) activity. See MPEP § 2106.05(d).

Regarding claim 20
Claim 20 – Step 1 – Is the claim to a process, machine, manufacture or composition of matter? 
Yes, the claim is to a machine.
Claim 20 – Step 2A – Prong 1 – Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, the claim recites an abstract idea.
wherein the content of the domain dependent representation is constrained to be dependent only on an identifier of an originating domain of the input data sample. – the recited “identifier of an originating domain” is itself just abstract symbolic data (a domain label or bit). Constraining the DDRep to depend only on the identifier is specifying an internal information partitioning rule, i.e., DDRep carries domain identity, while DIRep must carry everything else. This is a design of how to encode information in latent variables, an information-theoretic/mathematical relationship, implemented via architecture and/or loss functions. No concrete hardware, physical transformation, or external technical system is introduced. Accordingly, this limitation is part of the same abstract mathematical concept as claim 17. See MPEP § 2106.04(a)(2)(I).
Claim 20 – Step 2A – Prong 2 – Does the claim recite additional elements that integrate the judicial exception into a practical application?
No. There are no additional elements that integrate the judicial exception into a practical application. 
Claim 20 – Step 2B – Does the claim recite additional elements that amount to significantly more than the judicial exception?
No. There are no additional elements that amount to significantly more than the judicial exception. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-10, and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bousmalis et al. (Domain Separation Networks) in view of Matthey-de-l’Endroit et al. (US10373055B1). 

Regarding claim 1, Bousmalis in view of Matthey-de-l’Endroit teach an apparatus comprising:
a generator configured to generate a domain-independent representation of an input data sample; - Bousmalis teaches this limitation. Bousmalis shows a shared encoder operating on an input sample, where:
“shared-weight encoder                         
                            
                                    E
                                
                                    c
                                
                                    x
                                
                     learns to capture representation components for a given input sample” (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)

Bousmalis further explains that these representation components are partitioned into private and shared parts, where the shared part is shared across domains: 
“we explicitly learn to extract image representations that are partitioned into two subspaces: one component which is private to each domain and one which is shared across domains.” (Bousmalis, p. 1, Abstract)

And states that DSN is designed to learn domain-invariant representations and to transfer knowledge from labeled source data to an unlabeled target domain:
“We propose a novel method, the Domain Separation Networks (DSN), for learning domain–invariant representations.” (Bousmalis, p. 2, Introduction)

“In this setting, the source data is labeled for a particular task and we would like to transfer knowledge from the source to the target domain for which we have no ground truth labels.” (Bousmalis, p. 1, Introduction)

And the loss on the shared encoder                         
                            
                                    E
                                
                                    c
                                
                                    x
                                
                     makes clear it is the representation driving the task classifier:
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                     and                         
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                     where                         
                            
                                    x
                                
                                ^
                            
                     is the reconstruction of the input                         
                            x
                        
                     and                         
                            
                                    y
                                
                                ^
                            
                     is the task-specific prediction… The classification loss                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                    trains the model to predict the output labels we are ultimately interested in. Because we assume the target domain is unlabeled, the loss is applied only to the source domain.” (Bousmalis, p. 3-4, 3.1 Learning)

A POSITA would therefore understand that the shared-weight encoder                         
                            
                                    E
                                
                                    c
                                
                                    x
                                
                     produces a representation that is shared across domains and intended to be domain-invariant, used by the classifier                         
                            
                                    G
                                    (
                                    E
                                
                                    c
                                
                                    x
                                
                            )
                        
                     to transfer knowledge from the labeled source domain to the unlabeled target domain. This corresponds to a generator configured to generate a domain-independent representation of an input data sample, satisfying this limitation. 

an encoder configured to generate a domain-dependent representation of the input data sample; - Bousmalis teaches this limitation. Bousmalis introduce a private encoder that learns domain-specific components of the representation (the “private subspace”):
“A private encoder                         
                            
                                    E
                                
                                    p
                                
                            (
                            x
                            )
                        
                     (one for each domain) learns to capture domain–specific components of the representation.” (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)
These private encoders map an input sample                         
                            x
                        
                     to a representation capturing properties “unique to each domain” (e.g., background, low-level statistics), i.e., a domain-dependent representation (DDRep).

a decoder configured to ensure that a combination of the domain-independent representation and the domain-dependent representation contains sufficient information to reconstruct the input data sample; - Bousmalis teaches this limitation. Bousmalis’s shared decoder reconstructs the input from the sum/combination of shared and private codes: 
“A shared decoder learns to reconstruct the input sample by using both the private
and source representations.” (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)

And further explains that the reconstruction loss forces the combination of shared and private subspaces to be sufficient to reconstruct the input and avoid “trivial solutions” where one subspace collapses: 
“DSNs explicitly and jointly model both private and shared components of the domain representations… Finally, to ensure that the private representations are still useful (avoiding trivial solutions) and to add generalizability, we also add a reconstruction loss. The combination of these objectives is a model that produces a shared representation that is similar for both domains and a private representation that is different.” (Bousmalis, p. 3, Method)
Thus, Bousmalis teaches a decoder that ensures the combination DIRep+DDRep contains enough information to reconstruct the input.

a discriminator configured to attempt to determine an originating domain of the domain- independent representation; - Bousmalis teaches this limitation. Bousmalis describes using adversarial training to make shared features domain-invariant: 
“The first is trained to correctly predict task-specific class labels… the second is trained
to predict the domain of each input. DANN minimizes the domain classification loss… while maximizing it with respect to the parameters that are common to both classifiers.” (Bousmalis, p. 2, Related Work)

This Domain-Adversarial Neural Network (DANN) domain classifier + gradient reversal layer (GRL) is applied to the shared representation to produce a domain classification loss                         
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                     that regularizes the shared encoder: 
“For DANN regularization, we applied the GRL and the domain classifier” (Bousmalis, p. 4, Similarity Losses)

    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
 (Bousmalis, p. 5, Similarity Losses)

and a classifier configured to classify the input data sample based on the domain-independent representation of the input data sample; - Bousmalis teaches this limitation. Bousmalis explains that the goal of the DSN is to:
“train a classifier on data from the source domain that generalizes to the target domain.” (Bousmalis, p. 3, Method)

And that this is achieved by using representations that are invariant to domain:
“Like previous efforts [7, 8], our model is trained such that the representations of images from the source domain are similar to those from the target domain. This allows a classifier trained on images from the source domain to generalize as the inputs to the classifier are in theory invariant to the domain of origin.” (Bousmalis, p. 3, Method)
	
Bousmalis further specifies that inference in a DSN uses the shared encoder                         
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     to feed the classifier: 
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                     and                         
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                     where                         
                            
                                    x
                                
                                ^
                            
                     is the reconstruction of the input                         
                            x
                        
                     and                         
                            
                                    y
                                
                                ^
                            
                     is the task-specific prediction… The classification loss                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                    trains the model to predict the output labels we are ultimately interested in. Because we assume the target domain is unlabeled, the loss is applied only to the source domain.” (Bousmalis, p. 3-4, 3.1 Learning)
Figure 1 of Bousmalis shows that the classifier                         
                            G
                            (
                            ∙
                            )
                        
                     is driven by the shared encoder                         
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                    , while the private encoders                         
                            
                                    E
                                
                                    p
                                
                            (
                            x
                            )
                        
                     feed only the decoder for reconstruction and are not used for classification: 

    PNG
    media_image2.png
    200
    400
    media_image2.png
    Greyscale
 (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)
A POSITA would therefore understand that Bousmalis’s classifier operates on the shared/domain-invariant representation produced by the shared encoder, rather than on the domain-specific/private representation. This corresponds to a classifier configured to classify the input data sample based on the domain-independent representation of the input data sample, as recited in the limitation. 

wherein the generator is configured to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables the decoder to reconstruct the input sample from the domain-independent representation and the domain-dependent representation . – Bousmalis teaches this limitation in part. Bousmalis already partitions information into shared and private subspaces and explicitly uses losses to encourage independence between them and avoid contamination: 
“we explicitly learn to extract image representations that are partitioned into two subspaces: one component which is private to each domain and one which is shared across domains.” (Bousmalis, § Abstract)

“… we add a loss function that encourages independence of these parts… By partitioning the space in such a manner, the classifier trained on the shared representation is better able to generalize across domains as its inputs are uncontaminated with aspects of the representation that are unique to each domain.” (Bousmalis, p. 3, Method)

Bousmalis, however, does not itself constrain the capacity of the private representation.

Bousmalis does not teach: 
“… and wherein the domain-dependent representation is constrained to have low information content.”

Matthey-de-l’Endroit, however, teaches this limitation: 
“… and wherein the domain-dependent representation is constrained to have low information content.” - Matthey-de-l’Endroit teaches constraining a latent representation to be low-information and discloses training a VAE so that the latent bottleneck has limited information content through a loss that depends both on reconstruction quality and on a measure of independence / capacity of the latent factors, explaining that the VAE is trained by: 
“… adjusting current values of the parameters of the VAE by optimizing a loss function that depends on a quality of the reconstruction and also on a degree of independence between the latent factors in the latent representation of the unlabeled training image.” (Matthey-de-l’Endroit, col. 1, lines 53-57)

And further specifies that the loss can be written as:
“In some implementations, the loss function is of the form                        
                             
                            L
                            =
                            Q
                            -
                            B
                            (
                            K
                            L
                            )
                        
                    , where                         
                            Q
                        
                     is a term that depends on the quality of the reconstruction,                         
                            K
                            L
                        
                     is a term that measures the degree of independence between the latent factors in the latent representation and the effective capacity of the latent bottleneck, and                         
                            B
                        
                     is a tunable parameter.” (Matthey-de-l’Endroit, col. 1, lines 59-65)

The training engine is explicitly configured:

“… to reduce the effective capacity of the latent bottleneck and to increase statistical independence between the latent factors 121. When latent factors 121 are statistically independent from each other, the latent factors 121 contain less redundant information and more disentangled information that can be used to perform useful generalizations…” (Matthey-de-l’Endroit, col. 3, lines 46-52)

Thus, Matthey-de-l’Endroit teaches a standard, widely-used technique for constraining a latent code to have low-information content (low-capacity, KL-regularized bottleneck) while still supporting reconstruction and useful generalization.

Starting with Bousmalis’s DSN architecture, where the private representation encodes domain-specific factors and the shared representation encodes common factors, a POSITA seeking to (i) further disentangle shared vs private information, (ii) avoid over-capacity in the private/domain-dependent channel, and (iii) force more common information into the shared representation (thereby improving domain invariant classification), would have found it obvious to apply Matthey-de-l’Endroit’s latent-bottleneck constraint specifically to the private/domain-dependent representation, i.e., to add a VAE-style KL/capacity penalty on DDRep. 
Bousmalis’s stated objective is to learn domain-invariant representations by explicitly “introduce[ing] the notion of a private subspace for each domain” and a “shared subspace, enforced through the use of autoencoders and explicit loss functions”, to “separate the information that is unique to each domain” and produce better task representations. 
Matthey-de-l’Endroit’s contribution is precisely to train a VAE with a controlled-capacity latent bottleneck whose loss depends on “quality of the reconstruction” and “degree of independence between the latent factors”, so the latent becomes a minimal, low-information code that still supports reconstruction and generalization. 
Given that Bousmalis’s DSN already includes an autoencoder branch and a designated domain-dependent representation, a POSITA would have had a clear and logical reason to apply this known VAE bottleneck / KL regularization of Matthey-de-l’Endroit to the DDRep branch in order: 
to minimize the information in the domain-dependent code (reducing overfitting and leakage of shared information into DDRep); 
to push shared/common information into DIRep ,improving the domain-invariant  classifier; and
to strengthen the disentangling of common vs. domain-specific factors.
This is a straightforward, predictable combination of DSN’s shared/private domain-adaptation architecture with a standard VAE latent-capacity regularizer, yielding a domain-dependent representation constrained to have low information content. 

Regarding claim 2, Bousmalis in view of Matthey-de-l’Endroit teach the apparatus of Claim 1, wherein
the domain-dependent representation is constrained to have . – Bousmalis teaches this limitation in part. As set forth for claim 1, Bousmalis teaches an apparatus with a shared encoder                                 
                                    
                                            E
                                        
                                            c
                                        
                                    (
                                    x
                                    )
                                
                             that produces a shared/domain-invariant representation and private encoders                                 
                                    
                                            E
                                        
                                            p
                                        
                                    (
                                    x
                                    )
                                
                             that produce domain-specific/private representations, with a decoder reconstruction from the combination and a difference loss that encourages the shared and private encoders to encode different aspects of the input. Bousmalis thus partitions information into a shared subspace “captur[ing] representations shared by the domains” and a private subspace for “domain specific properties”, but does not explicitly state that the private representation has lower information content than the shared representation. 

Bousmalis does not teach: 
“… low information content relative to the domain-independent representation”

Matthey-de-l’Endroit, however, teaches this limitation: 
“… low information content relative to the domain-independent representation” - Matthey-de-l’Endroit teaches constraining a latent representation to be low-information and trains a VAE by: 
“… optimizing a loss function that depends on a quality of the reconstruction and also on a degree of independence between the latent factors in the latent representation of the unlabeled training image.” (Matthey-de-l’Endroit, col. 1, lines 54-57)

And specifies a loss L=QB(KL) where Q depends on reconstruction quality and KL measures:
“the degree of independence between the latent factors…  and an effective capacity of the latent bottleneck.” (Matthey-de-l’Endroit, col. 10, lines 1-3)

The training engine configured: 
“to reduce the effective capacity of the latent bottleneck and to increase statistical independence between the latent factors 121… the latent factors contain less redundant information and more disentangled information…” (Matthey-de-l’Endroit, col. 3, lines 46-51)

In view of Matthey-de-l’Endroit, a POSITA starting from Bousmalis’s DSN (where the shared representation carries task-relevant, cross-domain information and the private representation is reserved for domain-specific effects) would have found it obvious to apply this known KL / capacity bottleneck specifically to the private/domain-dependent branch. Doing so would (i) limit the effective capacity of the private representation so it contains minimal, domain-specific information (low information content) , while (ii) leaving the shared representation with greater capacity to carry the bulk of task and shared structure (since it must support classification, domain-invariance, and reconstruction). Under this combination, the domain-dependent representation is constrained to have lower information content relative to the domain-independent representation as a predictable result of allocating less bottleneck capacity to DDRep and more to DIRep, to improve disentangling and domain-invariant classification. 

Regarding claim 3, Bousmalis in view of Matthey-de-l’Endroit teach the apparatus of claim 1, wherein 
the generator receives generator input information related to a first domain and a second target domain and transforms the generator input information into the domain-independent representation of common elements of the first domain and the second target domain; - Bousmalis teaches this limitation. Bousmalis considers a source domain and a target domain: 
“Given a labeled dataset in a source domain and an unlabeled dataset in a target domain, our goal is to train a classifier on data from the source domain that generalizes to the target domain.” (Bousmalis, p. 2-3, § Method)

Figure 1 and its caption show that the shared encoder                         
                            
                                    E
                                
                                    c
                                
                                    x
                                
                     (generator) and the private encoders                         
                            
                                    E
                                
                                    p
                                
                            (
                            x
                            )
                        
                     (encoder) are applied to input samples from both domains: 
“shared-weight encoder                         
                            
                                    E
                                
                                    c
                                
                                    x
                                
                     learns to capture representation components for a given input sample that are shared among domains. A private encoder                         
                            
                                    E
                                
                                    p
                                
                                    x
                                
                     (one for each domain) learns to capture domain–specific components of the representation.” (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)

The introduction explains that the shared subspace captures what is common across domains, while the private subspaces capture domain-specific properties: 
“Our model…  introduces the notion of a private subspace for each domain, which captures domain specific properties, such as background and low level image statistics. A shared subspace… captures representations shared by the domains. By finding a shared subspace that is orthogonal to the subspaces that are private, our model is able to separate the information that is unique to each domain…” (Bousmalis, p. 2, Introduction)

Thus, Bousmalis’s shared-weight encoder                         
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     (the generator) receives inputs from both the labeled source domain and the unlabeled target domain and produces a shared representation “shared across domains”, which a POSITA would view as a domain-independent representation of common elements of the two domains. 

wherein the encoder receives the generator input information related to the first domain and the second target domain and transforms the generator input information into the domain-dependent representation, wherein the domain-dependent representation is a representation of elements to be reproduced; - Bousmalis teaches this limitation. Bousmalis discloses that the decoder uses shared + private, plus the loss equation: 
“A shared decoder learns to reconstruct the input sample by using both the private and source representations. The private and shared representation components are pushed apart with soft subspace orthogonality constraints                         
                            
                                    L
                                
                                    d
                                    i
                                    f
                                    f
                                    e
                                    r
                                    e
                                    n
                                    c
                                    e
                                
                    …” (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)

Bousmalis inference equations: 
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                     and                         
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                     where                         
                            
                                    x
                                
                                ^
                            
                     is the reconstruction of the input                         
                            x
                        
                     and                         
                            
                                    y
                                
                                ^
                            
                     is the task-specific prediction. The goal of training is to minimize the following loss with respect to parameters                         
                            Θ
                            =
                            
                                            θ
                                        
                                            c
                                        
                                    ,
                                     
                                            θ
                                        
                                            p
                                        
                                    ,
                                     
                                            θ
                                        
                                            d
                                        
                                    ,
                                     
                                            θ
                                        
                                            g
                                        
                            :
                             
                            L
                            =
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                            +
                            α
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                            +
                            β
                            
                                    L
                                
                                    d
                                    i
                                    f
                                    f
                                    e
                                    r
                                    e
                                    n
                                    c
                                    e
                                
                            +
                            γ
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                    … The classification loss                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                    trains the model to predict the output labels we are ultimately interested in… We use a scale–invariant mean squared error term [6] for the reconstruction loss                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                    , which is applied to both domains…”  (Bousmalis, p. 3-4, Learning)

Because                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                     is minimized with respect to the parameters                         
                            Θ
                            =
                            
                                            θ
                                        
                                            c
                                        
                                    ,
                                     
                                            θ
                                        
                                            p
                                        
                                    ,
                                     
                                            θ
                                        
                                            d
                                        
                                    ,
                                     
                                            θ
                                        
                                            g
                                        
                    , a POSITA would understand that the decoder’s output,                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                    , provides gradients that train both the shared encoder (generator) and the private encoders, along with the decoder. 

and wherein the decoder receives as inputs the domain independent representation and the domain dependent representation, the output of the decoder being used to train the encoder and the generator so the domain-independent representation is able to reproduce predictions that match an original first domain. – Bousmalis teaches this limitation. Bousmalis empirically reports that including the reconstruction loss improves classification accuracy compared to removing it or using a weaker reconstruction loss, demonstrating that this decoder-based training helps the shared representation support correct predictions on the source (first) domain. In Bousmalis’s DSN:
the reconstruction loss on the decoder’s output is back-propagated to train the encoders, shaping both DIRep and DDRep, as discussed above; and
together with the classification loss on source labels, a POSITA would understand that this training of both encoders causes the domain-independent representation                                 
                                    
                                            E
                                        
                                            c
                                        
                                            x
                                        
                             to be capable of supporting predictions                                 
                                    
                                            y
                                        
                                        ^
                                    
                                    =
                                    G
                                    
                                                    E
                                                
                                                    c
                                                
                                                    x
                                                
                             that match the original labels in the source (first) domain, while still generalizing to the target. 

For the same reasons set forth in the rejection of claim 1 over Bousmalis in view of Matthey-de-l’Endroit, and because Bousmalis teaches or renders obvious all additional limitations recited in claim 3, claim 3 is unpatentable under 35 U.S.C. § 103. 

Regarding claim 4, Bousmalis in view of Matthey-de-l’Endroit teach the apparatus of claim 1, wherein 
the encoder is configured to be penalized during training Bousmalis teaches this limitation In part. Bousmalis discloses shared and private encoders trained by minimizing a composite loss with respect to all encoder parameters: 
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                     and                         
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                     where                         
                            
                                    x
                                
                                ^
                            
                     is the reconstruction of the input                         
                            x
                        
                     and                         
                            
                                    y
                                
                                ^
                            
                     is the task-specific prediction. The goal of training is to minimize the following loss with respect to parameters                         
                            Θ
                            =
                            
                                            θ
                                        
                                            c
                                        
                                    ,
                                     
                                            θ
                                        
                                            p
                                        
                                    ,
                                     
                                            θ
                                        
                                            d
                                        
                                    ,
                                     
                                            θ
                                        
                                            g
                                        
                            :
                             
                            L
                            =
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                            +
                            α
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                            +
                            β
                            
                                    L
                                
                                    d
                                    i
                                    f
                                    f
                                    e
                                    r
                                    e
                                    n
                                    c
                                    e
                                
                            +
                            γ
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                    ” (Bousmalis, p. 3-4, Learning)

And explains that: 
                                
                                            L
                                        
                                            r
                                            e
                                            c
                                            o
                                            n
                                        
                             is a reconstruction loss applied to both domains;
                                
                                            L
                                        
                                            d
                                            i
                                            f
                                            f
                                            e
                                            r
                                            e
                                            n
                                            c
                                            e
                                        
                             is a “soft subspace orthogonality constraint between the private and shared representation of each domain”, encouraging them to encode different aspects of the input. 
Thus, Bousmalis teaches that the private encoder (which produces the domain-dependent representation) is penalized during training via reconstruction and difference losses. 

. – Bousmalis teaches this limitation in part. Bousmalis teaches the separation of shared vs. private subspaces and the use of difference loss: 
“Our model…  introduces the notion of a private subspace for each domain, which captures domain specific properties, such as background and low level image statistics. A shared subspace… captures representations shared by the domains. By finding a shared subspace that is orthogonal to the subspaces that are private, our model is able to separate the information that is unique to each domain…” (Bousmalis, p. 2, Introduction)

In Bousmalis, the shared representation                         
                            
                                    E
                                
                                    c
                                
                                    x
                                
                    : 
is used by the classifier G(                                
                                    
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    )
                                
                             for task predictions; and
participates in reconstruction D(                                
                                    
                                            E
                                        
                                            c
                                        
                                            x
                                        
                             +                                 
                                    
                                            E
                                        
                                            p
                                        
                                            x
                                        
                                    )
                                
                            .
In Bousmalis’s DSN, the DIRep (shared) is the main carrier of common / task-relevant information, while DDRep (private) should only carry domain-specific residuals, and the difference loss already pushes information out of the private subspace when it overlaps with shared.

Bousmalis does not teach this limitation: 
”… based on information content of the domain-dependent representation” 
“such that an amount of information is increased in… amount of information is decreased…”

Matthey-de-l’Endroit, however, teaches this limitation: 
”… based on information content of the domain-dependent representation” - Matthey-de-l’Endroit discloses training a VAE with a loss that explicitly measures and penalizes the information content / capacity of the latent bottleneck: 
“… adjusting current values of the parameters of the VAE by optimizing a loss function that depends on a quality of the reconstruction and also on a degree of independence between the latent factors in the latent representation of the unlabeled training image.” (Matthey-de-l’Endroit, col. 1, lines 53-57)

In particular, the loss may be: 
“                        
                            L
                            =
                            Q
                            -
                            B
                            (
                            K
                            L
                            )
                        
                    , where                         
                            Q
                        
                     is a term that depends on the quality of the reconstruction,                         
                            K
                            L
                        
                     is a term that measures the degree of independence between the latent factors in the latent representation and the effective capacity of the latent bottleneck, and                         
                            B
                        
                     is a tunable parameter.” (Matthey-de-l’Endroit, col. 1, lines 60-65)

The training engine is configured: 
“… to reduce the effective capacity of the latent bottleneck and to increase statistical independence between the latent factors 121. When latent factors 121 are statistically independent… the latent factors 121 contain less redundant information and more disentangled information that can be used to perform useful generalizations…” (Matthey-de-l’Endroit, col. 3, lines 46-52)
Thus, Matthey-de-l’Endroit teaches penalizing the encoder during training with a loss term that depends directly on the information content / capacity of its latent representation.

“such that an amount of information is increased in… amount of information is decreased…” – As above, Matthey-de-l’Endroit’s KL/capacity term reduces the effective capacity of the latent bottleneck and ensures the latent factors contain “less redundant information”. When that capacity-control is applied specifically to the domain-dependent latent (private code) in DSN:
the amount of information that can be stored in DDRep is decreased (due to the bottleneck penalty), while
any information still needed for reconstruction and classification that can no longer fit in the private code must be represented in the shared encoder’s output                                 
                                    
                                            E
                                        
                                            c
                                        
                                    (
                                    x
                                    )
                                
                            , which is not subject to the same tight capacity constraint but is already encouraged (by classification and similarity losses) to encode shared, domain-invariant structure. 

A POSITA would recognize this information shift as a predictable result of: 
restricting the capacity of the private/domain-dependent code, and 
DSN’s design that uses the shared representation as the primary vehicle for task and cross-domain information. 
For the reasons above, Bousmalis teaches the shared/private structure and loss framework, while Matthey-de-l’Endroit supplies the explicit information-content penalty on the latent; a POSITA would have combined them in a straightforward way that results in penalizing the encoder based on the information content of the domain-dependent representation and shifting information toward the domain-independent representation.

Regarding claim 5, Bousmalis in view of Matthey-de-l’Endroit teach the apparatus of claim 1, wherein 
the content of the domain dependent representation . – Bousmalis teaches this limitation in part. Bousmalis DSN (discussing DANN-style training) teaches a classifier trained to predict the domain of each input, i.e., using a domain identifier (source vs. target):
“… the second is trained to predict the domain of each input.” (Bousmalis, p. 2, § Related Work)
So Bousmalis teaches the use of a domain identifier associated with each sample (its originating domain). 

Bousmalis does not teach that the content of the domain independent representation:
“… is constrained to be dependent only on… ”

Matthey-de-l’Endroit, however, this limitation:
“… is constrained to be dependent only on… ” - Matthey-de-l’Endroit teaches a mechanism to constrain content and explicitly teaches reducing the information content/capacity of a latent bottleneck via a KL/capacity term:
“train the VAE 101 to reduce the effective capacity of the latent bottleneck … [so] the latent factors 121 contain less redundant information …” (Matthey-de-l’Endroit, col. 3, lines 45-51)

Starting from DSN’s private/DDRep branch, and in view of Matthey-de-l’Endroit’s teaching to make a latent minimal/low-capacity, a POSITA would have found it obvious to constrain the content of the domain-dependent representation so that it carries only what is strictly necessary for “domain dependence”. DSN already represents domain-of-origin as a domain identifier used for domain prediction. 

Regarding claim 6, Bousmalis in view of Matthey-de-l’Endroit teach the apparatus of claim 1, wherein 
the generator comprises a first generator for an input data sample of a source domain and a second generator for an input data sample a target domain. – Bousmalis teaches this limitation. Bousmalis explicitly operates with source and target domains:
“Given a labeled dataset in a source domain and an unlabeled dataset in a target domain, our goal is to train a classifier on data from the source domain that generalizes to the target domain.” (Bousmalis, p. 3, § Method)
Thus, Bousmalis teaches input data samples from a source domain and a target domain. 
Bousmalis further teaches providing the per-domain duplication pattern: 
“A private encoder                         
                            
                                    E
                                
                                    p
                                
                            (
                            x
                            )
                        
                     (one for each domain) learns to capture domain–specific components of the representation.” (Bousmalis, p. 3, Figure 1: Training of our Domain Separation Networks)

The DSN architecture diagrams label these as a private source encoder and a private target encoder, showing distinct modules for the two domains. 
Bousmalis does not expressly teach shared encoder                         
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     is a shared-weight encoder, not two separate generators for source vs target. 

Given that Bousmalis already teaches (i) explicit handling of source and target domains, and (ii) the design pattern of using one network per domain (separate parameter sets) for a representation-producing component (the private encoder), a POSITA would have found it obvious to apply the same per-domain duplication pattern to the “generator” (the representation-producing module for DIRep) when implementing claim 1’s architecture, i.e., provide a first generator for source-domain inputs and a second generator for target-domain inputs (e.g., same architecture but different weights) to accommodate differing low-level domain statistics while still producing a domain-independent representation under the same training objectives (classification, reconstruction, and domain-invariance losses). This is a routine architectural variation with predictable results. 

Regarding claim 7, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein a loss function for the classifier is: 
                                
                                            L
                                        
                                            c
                                        
                                    =
                                    
                                            L
                                        
                                            c
                                        
                                                    l
                                                
                                                ^
                                            
                                            ,
                                             
                                            l
                                        
                                    =
                                    
                                            L
                                        
                                            c
                                        
                                    (
                                    C
                                    
                                            G
                                            
                                                    x
                                                
                                            ,
                                             
                                            l
                                        
                                    ;
                                
                             - Bousmalis teaches this limitation. Bousmalis’s DSN expressly defines the prediction as a task-specific function applied to the shared representation:
“                        
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            (
                            
                                    E
                                
                                    c
                                
                                    x
                                
                            )
                        
                     …                         
                            
                                    y
                                
                                ^
                            
                     is the task-specific prediction.” (Bousmalis, p. 3, § Learning)
This corresponds to applicant’s structure                         
                            C
                            (
                            G
                            
                                    x
                                
                            )
                        
                    : a classifier applied to a learned representation produced from the input.

Bousmalis further defines the classification loss as negative log-likelihood (cross-entropy) using the ground-truth label and the model’s softmax prediction: 
“We want to minimize the negative log-likelihood of the ground truth class for each source domain sample:                         
                            L
                            =
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                                N
                                            
                                                S
                                            
                                            y
                                        
                                            i
                                        
                                            S
                                        
                            *
                            l
                            o
                            g
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                    , where                         
                            
                                    y
                                
                                    i
                                
                                    S
                                
                     is the one–hot encoding of the class label for source input                         
                            i
                        
                     and                         
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                     are the softmax predictions of the model:                         
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                            =
                            G
                            (
                            
                                    E
                                
                                    C
                                
                                            x
                                        
                                            i
                                        
                                            S
                                        
                            )
                        
                    .” (Bousmalis, p. 5, § Learning)
This is substantively the same as the claimed                         
                            c
                            (
                            C
                            
                                    G
                                    
                                            x
                                        
                            ,
                             
                            l
                            )
                        
                    : a loss function taking (prediction, label) arguments and implemented via cross-entropy / negative log-likelihood.
Because Bousmalis teaches the additional limitation of claim 7 (classifier prediction from a learned representation and a cross-entropy/negative-log-likelihood classifier loss defined on predicted output vs. label), claim 7 is unpatentable under 35 U.S.C. § 103 over Bousmalis in view of Matthey-de-l’Endroit, for the reasons set forth for claim 1 and further in view of the teachings above.  

Regarding claim 8, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein 
a loss function for the discriminator is:                                 
                                    
                                            L
                                        
                                            d
                                        
                            =                                
                                    
                                            L
                                        
                                            d
                                        
                                                    d
                                                
                                                ^
                                            
                                            ,
                                            d
                                        
                                    =
                                    
                                            L
                                        
                                            d
                                        
                                    (
                                    D
                                    (
                                    G
                                    
                                            x
                                        
                                    ,
                                     
                                    d
                                    )
                                
                            . – Bousmalis teaches this limitation. Bousmalis teaches:
 (i) discriminator output                         
                            
                                    d
                                
                                ^
                            
                     computed from the “generator” output                         
                            G
                            (
                            x
                            )
                        
                     (domain-independent representation). Bousmalis discloses a domain classifier (i.e., discriminator) that maps the shared representation                         
                            
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     to a predicted domain label                         
                            
                                    d
                                
                                ^
                            
                    :
“                        
                            Z
                            
                                    Q
                                    
                                                    h
                                                
                                                    c
                                                
                                    ;
                                     
                                            θ
                                        
                                            z
                                        
                            →
                            
                                    d
                                
                                ^
                            
                            p
                            a
                            r
                            a
                            m
                            e
                            t
                            e
                            r
                            i
                            z
                            e
                            d
                             
                            b
                            y
                             
                                    θ
                                
                                    z
                                
                            m
                            a
                            p
                            s
                             
                            a
                             
                            s
                            h
                            a
                            r
                            e
                            d
                             
                            r
                            e
                            p
                            r
                            e
                            s
                            e
                            n
                            t
                            a
                            t
                            i
                            o
                            n
                             
                            v
                            e
                            c
                            t
                            o
                            r
                        
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                                    x
                                    ;
                                    
                                            θ
                                        
                                            c
                                        
                     to a prediction of the label                         
                            
                                    d
                                
                                ^
                            
                            ∈
                            {
                            0
                            ,
                             
                            1
                            }
                        
                     of the input sample x.” (Bousmalis, p. 4, § 3.2 Similarity Losses)
A POSITA would read this as teaching the claimed structure                         
                            
                                    d
                                
                                ^
                            
                            =
                            D
                            (
                            G
                            
                                    x
                                
                            )
                        
                     because:
the claimed                                 
                                    G
                                    (
                                    x
                                    )
                                
                             corresponds to producing the learned representation (here,                                 
                                    
                                            E
                                        
                                            c
                                        
                                    (
                                    x
                                    )
                                
                            ), and
the claimed disclaimer                                 
                                    D
                                    (
                                    ∙
                                    )
                                
                             corresponds to the domain classifier                                 
                                    Z
                                    (
                                    ∙
                                    )
                                
                             that outputs                                 
                                    
                                            d
                                        
                                        ^
                                    
                            . 
(ii)  loss function                         
                            d
                            (
                            
                                    d
                                
                                ^
                            
                            ,
                             
                            d
                            )
                        
                     between predicted domain                         
                            
                                    d
                                
                                ^
                            
                     and ground-truth domain                         
                            d
                        
                    . Bousmalis expressly defines the domain-adversarial (domain classification) loss as a binomial cross-entropy function of                         
                            
                                            d
                                        
                                        ^
                                    
                                    i
                                
                     and                         
                            
                                    d
                                
                                    i
                                
                    :
“
    PNG
    media_image3.png
    134
    635
    media_image3.png
    Greyscale
” (Bousmalis, p. 5, § Similarity Losses)
This is exactly a loss of the form                         
                            d
                            (
                            
                                    d
                                
                                ^
                            
                            ,
                             
                            d
                            )
                        
                    , applied to discriminator prediction                         
                            
                                    d
                                
                                ^
                            
                     versus true domain labels                         
                            d
                        
                    , as required by claim 8.
Because Bousmalis teaches the discriminator/domain-classifier output                         
                            
                                    d
                                
                                ^
                            
                     computed from the shared representation                         
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     and teaches an explicit discriminator loss function                         
                            d
                            (
                            
                                    d
                                
                                ^
                            
                            ,
                             
                            d
                            )
                        
                     (binomial cross-entropy between predicted and true domain labels), claim 8 is obvious under 35 U.S.C. § 103 over the applied combination, with the claim 8 additional limitation taught by Bousmalis.

Regarding claim 9, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein 
the generator has a smaller loss when the discriminator makes a wrong prediction and a loss function for the generator is:                                 
                                    
                                            L
                                        
                                            g
                                        
                                    =
                                    
                                            L
                                        
                                            g
                                        
                                                    d
                                                
                                                ^
                                            
                                            ,
                                             
                                            1
                                            -
                                            d
                                        
                                    =
                                    
                                            L
                                        
                                            g
                                        
                                            D
                                            
                                                    G
                                                    
                                                            x
                                                        
                                            ,
                                             
                                            1
                                            -
                                            d
                                        
                            . – Bousmalis teaches this limitation. Bousmalis’s DSN defines a domain classifier operating on the shared representation:
“                        
                            
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            ;
                            
                                    θ
                                
                                    c
                                
                            )
                        
                    ” and “                        
                            Z
                            (
                            Q
                            
                                            h
                                        
                                            c
                                        
                            ;
                             
                                    θ
                                
                                    z
                                
                            )
                        
                     [Wingdings font/0xE0]                         
                            
                                    d
                                
                                ^
                            
                    ” with “                        
                            
                                    d
                                
                                ^
                            
                            ∈
                            {
                            0
                            ,
                             
                            1
                            }
                        
                    ”
This corresponds to the discriminator taking the generator output and producing                         
                            
                                    d
                                
                                ^
                            
                    :                         
                            
                                    d
                                
                                ^
                            
                            =
                            D
                            
                                    G
                                    
                                            x
                                        
                            .
                        
In addition, Bousmalis explicitly characterizes GRL training as adversarial and aimed at reducing domain classification accuracy from the perspective of the shared encoder (“generator”): 
“Learning with a GRL is adversarial … the reversal of the gradient results in …                         
                            
                                    θ
                                
                                    c
                                
                     learning representations from which domain classification accuracy is reduced.” (Bousmalis, p. 4, § Similarity Losses)

DSN also explains the DANN minimax objective (domain classifier vs. shared feature extractor), i.e., domain loss is minimized for the domain classifier while effectively maximizing (confused) for the shared feature extractor through GRL. 
This is exactly the substance of claim 9’s statement that the generator has a smaller loss when the discriminator is wrong: the generator-side objective is to make the discriminator unable to correctly predict the true domain label. 
Because Bousmalis teaches (i) a discriminator/domain classifier producing                         
                            
                                    d
                                
                                ^
                            
                     from the shared representation and (ii) adversarial (GRL) training of the shared encoder so domain prediction is wrong/indistinguishable, claim 9 is unpatentable under 35 U.S.C. 103 over Bousmalis in view of Matthey-de-l’Endroit, as applied to claim 1. 

Regarding claim 10, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein 
a reconstruction loss of the decoder is:                                 
                                    
                                            L
                                        
                                            r
                                        
                                    =
                                    
                                            L
                                        
                                            r
                                        
                                                    x
                                                
                                                ^
                                            
                                            ,
                                             
                                            x
                                        
                                    =
                                    
                                            L
                                        
                                            r
                                        
                                            F
                                            
                                                    G
                                                    
                                                            x
                                                        
                                                    ,
                                                     
                                                    E
                                                    
                                                            x
                                                        
                                            ,
                                             
                                            x
                                        
                            . – Bousmalis expressly discloses a decoder producing a reconstruction                                 
                                    
                                            x
                                        
                                        ^
                                    
                             from a shared representation and a private representation, and training with a reconstruction loss comparing                                 
                                    x
                                
                             and                                 
                                    
                                            x
                                        
                                        ^
                                    
                            :
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                     … where                         
                            
                                    x
                                
                                ^
                            
                     is the reconstruction of the input                         
                            x
                        
                    ” (Bousmalis, p. 3, §3.1 Learning)

“We use a scale–invariant mean squared error term [6] for the reconstruction loss                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                     …                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                            =
                            
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                                N
                                            
                                                S
                                            
                                            L
                                        
                                            s
                                            i
                                            _
                                            m
                                            s
                                            e
                                        
                            (
                            
                                    x
                                
                                    i
                                
                                    S
                                
                            ,
                             
                                            x
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                            )
                        
                     …” (Bousmalis, p. 4, 3.1 Learning)

“
    PNG
    media_image4.png
    142
    340
    media_image4.png
    Greyscale
 where … is the squared                         
                            
                                    L
                                
                                    2
                                
                    -norm.” (Bousmalis, p. 4, 3.1 Learning)

A POSITA would have been motivated to combine Bousmalis (DSN) with Matthey-de-l’Endroit because DSN already uses an autoencoder-style reconstruction objective to learn a shared (domain-invariant) representation and a private (domain-specific) representation for cross-domain generalization, while Matthey-de-l’Endroit teaches a well-known VAE technique for constraining latent information/capacity using a KL/capacity regularizer without sacrificing reconstruction quality. Applying Matthey-de-l’Endroit capacity/bottleneck regularization to DSN’s domain-dependent (private) latent is a straightforward, predictable way to (i) reduce over-capacity/overfitting in the private channel, (ii) improve disentanglement between shared vs. private factors, and (iii) thereby improve domain-invariant generalization, while keeping DSN’s existing reconstruction training intact.

Regarding claim 12, Bousmalis in view of Matthey-de-l’Endroit teach, the apparatus of claim 1, 
wherein the encoder is configured to use a L2-norm loss for a reconstruction loss and the discriminator is configured with a discriminator loss, - Bousmalis teaches this limitation. Bousmalis’s DSN discloses using mean-squared-error style reconstruction losses that explicitly use the squared L2-norm, and further discloses a variant using a direct L2 reconstruction loss: 
Bousmalis’s DSN defines its scale-invariant reconstruction loss using the squared L2-norm term: 
“We use a scale–invariant mean squared error term [6] for the reconstruction loss                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                     …                         
                            ∥
                            x
                            -
                            
                                    x
                                
                                ^
                            
                            ∥
                            
                                        2
                                    
                                        2
                                    
                     …” (Bousmalis, p. 4, § 3.1 Learning)

In addition, Bousmalis’s DSN expressly evaluates a reconstruction-loss variant (an L2-norm/MSE reconstruction loss): 
“… with                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                                    L
                                    2
                                
                            =
                            
                                    1
                                
                                    k
                                
                            ∥
                            x
                            -
                            
                                    x
                                
                                ^
                            
                            ∥
                            
                                        2
                                    
                                        2
                                    
                            .
                        
                    ” (Bousmalis, p. 8, § 4.3 Discussion)

Accordingly, Bousmalis teaches the recited “L2-norm loss for a reconstruction loss”. 

the generator is configured with a generator loss, and the classifier is configured with a classifier loss, wherein the discriminator loss, the generator loss, and the classifier loss are based on cross entropy. – Bousmalis teaches this limitation. Bousmalis’s DSN explicitly defines the task/classification loss as minimizing negative log-likelihood of the ground-truth class:
“We want to minimize the negative log–likelihood of the ground truth class for each source domain sample:                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                            =
                            -
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                                N
                                            
                                                S
                                            
                                            y
                                        
                                            i
                                        
                                            s
                                        
                            ∙
                            l
                            o
                            g
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                    ” (Bousmalis, p. 4, § 3.1 Learning)

And Bousmalis’s DSN makes clear                         
                            
                                    y
                                
                                ^
                            
                     is produced by a softmax classifier on the shared representation:
“… and                         
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                     are the softmax predictions of the model:                         
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    S
                                
                            =
                            G
                            (
                            
                                    E
                                
                                    C
                                
                                            x
                                        
                                            i
                                        
                                            S
                                        
                            )
                        
                    .” (Bousmalis, p. 4, § 3.1 Learning)
Thus, Bousmalis discloses standard cross-entropy/negative log-likelihood classification.

Bousmalis’s DSN further discloses that training with Gradient Reversal Layer (GRL) is adversarial, where: 
“Learning with a GRL is adversarial in that                         
                            
                                    θ
                                
                                    z
                                
                     is optimized to increase Z’s ability to discriminate between encodings of images from the source or target domains, while the reversal of the gradient results in the model parameters                         
                            
                                    θ
                                
                                    c
                                
                     learning representations from which domain classification accuracy is reduced.” (Bousmalis, p. 4, § 3.2 Similarity Losses)

Bousmalis’s DSN explicitly states the same binomial cross-entropy term is optimized in opposite directions for the domain classifier vs. the shared encoder:
“Essentially, we maximize the binomial cross–entropy for the domain prediction task with respect to                         
                            
                                    θ
                                
                                    z
                                
                    , while minimizing it with respect to                         
                            
                                    θ
                                
                                    z
                                
                    ” (Bousmalis, p. 4, § 3.2 Similarity Losses)

And Bousmalis’s DSN reiterates the optimization direction in its implementation discussion of DANN regularization:
“For DANN regularization, … We optimized                         
                            L
                            =
                            
                                    L
                                
                                    c
                                    l
                                    a
                                    s
                                    s
                                
                            +
                            γ
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                                    D
                                    A
                                    N
                                    N
                                
                     by minimizing it with respect to                         
                            
                                    θ
                                
                                    c
                                
                            ,
                             
                                    θ
                                
                                    g
                                
                     and maximizing it with respect to the domain classifier parameters                         
                            
                                    θ
                                
                                    z
                                
                    .” (Bousmalis, p. 8, § 4.2 Implementation Details)

Taken together, Bousmalis’s DSN provides explicit support that the generator-side training signal (i.e., the shared encoder/generator that produces the domain-invariant representation) is driven by the same cross-entropy domain loss, used adversarially.

For the same reasons set forth for claim 1, a POSITA would have been motivated to use Bousmalis (DSN) as the base architecture (shared/private encoders with classifiers, domain discriminator, and reconstruction training). Because Bousmalis teaches the additional limitations of claim 12 (L2-norm reconstruction loss; cross-entropy-based discriminator/classifier losses; and generator-side adversarial loss derived from the discriminator cross-entropy), and because claim 12 depends from claim 1), claim 12 is unpatentable under 35 U.S.C. § 103.

Regarding claim 13, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein 
a gradient-descent based learning dynamic for the generator is based on:                                 
                                    ∆
                                    G
                                    =
                                    -
                                    
                                            α
                                        
                                            G
                                        
                                            λ
                                            
                                                    ∂
                                                    
                                                            L
                                                        
                                                            g
                                                        
                                                    ∂
                                                    G
                                                
                                            +
                                            β
                                            
                                                    ∂
                                                    
                                                            L
                                                        
                                                            c
                                                        
                                                    ∂
                                                    G
                                                
                                            +
                                            γ
                                            
                                                    ∂
                                                    
                                                            L
                                                        
                                                            r
                                                        
                                                    ∂
                                                    G
                                                
                                    ;
                                
                             - Bousmalis teaches this limitation. Bousmalis’s DSN expressly trains by minimizing a weighted sum of losses (task/classification + reconstruction + similarity, etc.) “with respect to parameters”:
“The goal of training is to minimize the following loss with respect to parameters                         
                            Θ
                            =
                            
                                            θ
                                        
                                            c
                                        
                                    ,
                                     
                                            θ
                                        
                                            p
                                        
                                    ,
                                     
                                            θ
                                        
                                            d
                                        
                                    ,
                                     
                                            θ
                                        
                                            g
                                        
                    ” (Bousmalis, p. 4, § 3.1 Learning)

“                        
                            L
                            =
                        
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                            +
                             
                            α
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                            +
                            β
                            
                                    L
                                
                                    d
                                    i
                                    f
                                    f
                                    e
                                    r
                                    e
                                    n
                                    c
                                    e
                                
                            +
                            γ
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                     where                         
                            α
                            ,
                             
                            β
                            ,
                             
                            γ
                        
                     are weights that control the interaction of the loss terms.” (Bousmalis, p. 4, § 3.1 Learning)

Bousmalis’s DSN also teaches that, for the domain-adversarial similarity term, the shared-representation parameters are trained adversarially (i.e., generator-side behavior is driven by the domain loss term):
“… we maximize the binomial cross–entropy for the domain prediction task with respect to                         
                            
                                    θ
                                
                                    z
                                
                    , while minimizing it with respect to                         
                            
                                    θ
                                
                                    c
                                
                    ” (Bousmalis, p. 4, § 3.2 Similarity Losses)
These passages collectively support claim 13’s idea that the generator-side update is driven by a weighted combination of gradients from multiple losses (generator/adversarial term + classifier term + reconstruction term), with explicit weight parameters controlling interaction.

a gradient-descent based learning dynamic for the classifier is based on:                                 
                                    ∆
                                    C
                                    =
                                    -
                                    
                                            α
                                        
                                            c
                                        
                                            ∂
                                            
                                                    L
                                                
                                                    c
                                                
                                            ∂
                                            C
                                        
                                    ;
                                
                             - Bousmalis teaches this limitation. Bousmalis’s DSN discloses explicit task/classification loss used in the minimized objective:
“The classification loss                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                     trains the model to predict the output labels …We want to minimize the negative log-likelihood of the ground truth class …” (Bousmalis, p. 4, § 3.1 Learning)

“…                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                            =
                        
                     …                         
                            
                                    y
                                
                                    i
                                
                                    s
                                
                            ∙
                            l
                            o
                            g
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    s
                                
                    ” (Bousmalis, p. 4, § 3.1 Learning)
Since DSN states the overall objective (including                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                    ) is minimized “with respect to parameters”, this supports classifier-side gradient-based training driven by the classifier loss. 
 
a gradient-descent based learning dynamic for the discriminator is based on:                                 
                                    ∆
                                    D
                                    =
                                    -
                                    
                                            α
                                        
                                            D
                                        
                                            ∂
                                            
                                                    L
                                                
                                                    d
                                                
                                            ∂
                                            D
                                        
                                    ;
                                
                             - Bousmalis teaches this limitation. Bousmalis defines a domain-discriminator cross-entropy objective and states the discriminator parameters are optimized to improve discrimination:
“                        
                            Z
                            (
                            Q
                            
                                            h
                                        
                                            c
                                        
                            ;
                             
                                    θ
                                
                                    z
                                
                    … maps … to a prediction … Learning with a GRL is adversarial in that                         
                            
                                    θ
                                
                                    z
                                
                     is optimized to increase Z’s ability to discriminate …” (Bousmalis, p. 4, § 3.2 Similarity Losses)

“Essentially, we maximize the binomial cross–entropy … with respect to                         
                            
                                    θ
                                
                                    z
                                
                    , while minimizing it with respect to                         
                            
                                    θ
                                
                                    z
                                
                     …                         
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                                    D
                                    A
                                    N
                                    N
                                
                            =
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                                N
                                            
                                                S
                                            
                                        +
                                        
                                                N
                                            
                                                t
                                            
                                                    d
                                                
                                                    i
                                                
                                            l
                                            o
                                            g
                                            
                                                            d
                                                        
                                                        ^
                                                    
                                                    i
                                                
                                            +
                                            
                                                    1
                                                    -
                                                    
                                                            d
                                                        
                                                            i
                                                        
                                            l
                                            o
                                            g
                                            ⁡
                                            (
                                            1
                                            -
                                            
                                                            d
                                                        
                                                        ^
                                                    
                                                    i
                                                
                                            )
                                        
                            .
                        
                    ” (Bousmalis, p. 4, § 3.2 Similarity Losses)
Maximizing that (non-negated) cross-entropy term is the same training signal as minimizing a discriminator loss d, which is what claim 13 is expressing.

and a gradient-descent based learning dynamic for the decoder is based on:                                 
                                    ∆
                                    F
                                    =
                                    -
                                    
                                            α
                                        
                                            F
                                        
                                            ∂
                                            
                                                    L
                                                
                                                    r
                                                
                                            ∂
                                            F
                                        
                                    ;
                                
                             - Bousmalis teaches this limitation. Bousmalis’s DSN teaches explicit decoder and reconstruction loss, where reconstruction is minimized as part of training:
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            (
                            
                                    E
                                
                                    c
                                
                                    x
                                
                            +
                             
                                    E
                                
                                    p
                                
                                    x
                                
                            )
                        
                    ” (Bousmalis, p. 3, § 3.1 Learning)

“We use a scale–invariant mean squared error term [6] for the reconstruction loss                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                    ” (Bousmalis, p. 4, § 3.1 Learning)

“The goal of training is to minimize the following loss with respect to parameters                         
                            Θ
                            =
                            
                                            θ
                                        
                                            c
                                        
                                    ,
                                     
                                            θ
                                        
                                            p
                                        
                                    ,
                                     
                                            θ
                                        
                                            d
                                        
                                    ,
                                     
                                            θ
                                        
                                            g
                                        
                            :
                             
                            L
                            =
                        
                     …                         
                            +
                             
                            α
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                            +
                        
                     …” (Bousmalis, p. 4, § 3.1 Learning)
This supports the decoder-side parameters (                        
                            
                                    θ
                                
                                    d
                                
                    ) are trained by minimizing a reconstruction-loss term, matching claim 13’s decoder update based on                         
                            
                                    ∂
                                    
                                            L
                                        
                                            r
                                        
                                    ∂
                                    F
                                
                    . 
 
wherein                                 
                                    
                                            α
                                        
                                            C
                                            ,
                                             
                                            D
                                            ,
                                             
                                            E
                                            ,
                                             
                                            F
                                            ,
                                             
                                            G
                                        
                             are learning rates,                                 
                                    
                                            L
                                        
                                            d
                                        
                             is a discriminator loss,                                 
                                    
                                            L
                                        
                                            g
                                        
                             is a generator loss, and                                 
                                    
                                            L
                                        
                                            c
                                        
                             is a classifier loss. – Bousmalis teaches this limitation. Bousmalis’s DSN discloses explicit learning rate and schedule in SGD training: 
“All the models were implemented using                         
                            
                                    T
                                    e
                                    n
                                    s
                                    o
                                    r
                                    F
                                    l
                                    o
                                    w
                                     
                                    5
                                
                     [1] and were trained with Stochastic Gradient Descent plus momentum [28]. Our initial learning rate was multiplied by 0:9 every 20,000 steps (mini-batches).” (Bousmalis, p. 7, § 4.2 Implementation Details)
This supports that DSN uses learning rates in gradient-decent training, corresponding to claim 13’s learning-rate parameters for the component updates.

Bousmalis does not teach: 
a gradient-descent based learning dynamic for the encoder is based on:                                 
                                    ∆
                                    E
                                    =
                                    -
                                    
                                            α
                                        
                                            E
                                        
                                    (
                                    
                                            ∂
                                            
                                                    L
                                                
                                                    k
                                                    l
                                                
                                            ∂
                                            E
                                        
                                    +
                                    μ
                                    
                                            ∂
                                            
                                                    L
                                                
                                                    r
                                                
                                            ∂
                                            E
                                        
                                    )
                                    ;
                                
Matthey-de-l’Endroit, however, teaches this limitation: 
a gradient-descent based learning dynamic for the encoder is based on:                                 
                                    ∆
                                    E
                                    =
                                    -
                                    
                                            α
                                        
                                            E
                                        
                                    (
                                    
                                            ∂
                                            
                                                    L
                                                
                                                    k
                                                    l
                                                
                                            ∂
                                            E
                                        
                                    +
                                    μ
                                    
                                            ∂
                                            
                                                    L
                                                
                                                    r
                                                
                                            ∂
                                            E
                                        
                                    )
                                    ;
                                
                             - Matthey-de-l’Endroit expressly discloses adjusting VAE parameters by determining a gradient of a loss function, where the loss function includes both (i) a term depending on reconstruction quality and (ii) a term measuring independence (KL), such that the parameter update is driven by the combined gradient combination of those terms:
“adjusting current values of the parameters of the VAE by determining a gradient of a loss function with respect to the parameters of the VAE, wherein the loss function depends on a quality of the reconstruction … and also on a degree of independence between the latent factors …” (Matthey-de-l’Endroit, col. 9, lines 55-61)

Matthey-de-l’Endroit further specifies that the loss is explicitly composed of a reconstruction-quality term and a KL term with a weighting parameter:
“the loss function is of the form L=Q-B(KL), where Q is a term that depends on the quality of the reconstruction, KL is a term that measures the degree of independence …” (Matthey-de-l’Endroit, col. 1, lines 59-62)

And Matthey-de-l’Endroit explains that gradients of the loss components are combined to form the loss gradient used for updating parameters:
“determines a gradient of each component of the components of the loss function and aggregates each component gradient to generate the gradient of the loss function.” (Matthey-de-l’Endroit, col. 5, lines 53-56)
Accordingly, Matthey-de-l’Endroit teaches an encoder learning dynamic based on the combined gradient of a reconstruction-quality term (corresponding to “r”) and a KL/independence term (corresponding to “kl”), with an explicit weighting factor B for the KL component, i.e., the substance of claim 13’s                         
                            
                                    ∂
                                    
                                            L
                                        
                                            k
                                            l
                                        
                                    ∂
                                    E
                                
                            +
                            μ
                            
                                    ∂
                                    
                                            L
                                        
                                            r
                                        
                                    ∂
                                    E
                                
                    .

Bousmalis’s DSN already trains a multi-branch encoder/decoder/classifier/discriminator architecture by minimizing a weighted sum of loss terms using Stochastic Gradient Descent (SGD) and adversarial Gradient Reversal Layer (GRL) training. Matthey-de-l’Endroit teaches adding a KL/capacity regularization term to a latent representation while still optimizing reconstruction quality, and teaches updating encoder/decoder parameters via gradient descent/backpropagation. A POSITA would have found it obvious to implement the combined DSN + Matthey-de-l’Endroit’s system using routine gradient-descent learning-rate updates of the respective network parameters based on the corresponding loss components.

Regarding claim 14, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein 
the apparatus is configured to classify data that evolves over time. – Bousmalis teaches this limitation. Bousmalis expressly frames the task as training a classifier on a source domain that generalizes to a different target domain, where the target domain is an unlabeled dataset with different characteristics that the source, i.e., classification under a distribution shift: 
“Given a labeled dataset in a source domain and an unlabeled dataset in a target domain, our goal is to train a classifier on data from the source domain that generalizes to the target domain.” (Bousmalis, p. 2-3, § 3 Method)

Bousmalis’s DSN also states its goal as learning domain-invariant representations for cross-domain generalization:
“We propose a novel method, the Domain Separation Networks (DSN), for learning domain-invariant representations.” (Bousmalis, p. 2, § 1 Introduction)
A POSITA would recognize that “data that evolves over time” is a well-known scenario of dataset shift / non-stationarity (e.g., data distributions changing as time passes). DSN’s explicit teaching of training on one distribution (source) and generalizing to another distribution (target) directly addresses classification where the data distribution changes, including changes occurring over time. 
Because Bousmalis’s DSN is expressly directed to maintaining classification performance when the input distribution differs between training and deployment (source vs. target), a POSITA would have found it obvious to apply DSN’s classifier to time-evolving data (a common practice and predictable cause of distribution change), and to describe the apparatus as “configured to classify data the evolves over time”.  
Regarding claim 15, Bousmalis in view of Matthey-de-l’Endroit, teach the apparatus of claim 1, wherein 
the domain-dependent representation is a label indicating an originating domain of the input data sample. – Bousmalis teaches this limitation. Bousmalis’s DSN expressly uses a domain label as the representation of which domain a sample originates from:
DSN defines the domain-adversarial loss using a ground-truth domain label                         
                            
                                    d
                                
                                    i
                                
                     and explains it is the “domain prediction task”:
“                        
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                                    D
                                    A
                                    N
                                    N
                                
                            =
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                                N
                                            
                                                S
                                            
                                        +
                                        
                                                N
                                            
                                                t
                                            
                                                    d
                                                
                                                    i
                                                
                                            l
                                            o
                                            g
                                            
                                                            d
                                                        
                                                        ^
                                                    
                                                    i
                                                
                                            +
                                            
                                                    1
                                                    -
                                                    
                                                            d
                                                        
                                                            i
                                                        
                                            l
                                            o
                                            g
                                            ⁡
                                            (
                                            1
                                            -
                                            
                                                            d
                                                        
                                                        ^
                                                    
                                                    i
                                                
                                            )
                                        
                            .
                        
                    ” (Bousmalis, p. 4, § 3.2 Similarity Losses)

DSN also makes clear that the domain classifier produces a predicted domain label                         
                            
                                    d
                                
                                ^
                            
                     and is optimized to discriminate domain labels, i.e., the domain identity is represented as a label variable: 
“…                         
                            
                                    θ
                                
                                    z
                                
                     is optimized to increase Z’s ability to discriminate …” (Bousmalis, p. 4, § 3.2 Similarity Losses)
A POSITA would understand DSN’s                         
                            
                                    d
                                
                                    i
                                
                     as exactly “a label indicating an originating domain of the input data sample” (e.g., source vs. target domain).
Even if claim 1’s “domain-dependent representation” is otherwise implemented as a learned private code in DSN, claim 15 merely specifies a particular (and simpler) form: use the domain label itself as the domain-dependent representation. DSN already uses that domain label                         
                            
                                    d
                                
                                    i
                                
                     explicitly throughout training as the domain-identity signal; it would have been an obvious design choice to output or supply that same domain label as the “domain-dependent representation”, since it is the most direct representation of domain identity and is already present in DSN’s training signal. 

Regarding claim 16, Bousmalis in view of Matthey-de-l’Endroit, teach a computer program product, comprising:
generating a domain-independent representation of an input data sample; - Bousmalis teaches this limitation. Bousmalis’s DSN discloses a shared (domain-invariant) representations extracted by a shared-weight encoder:
“shared-weight encoder                         
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     learns to capture representation components for a given input sample that are shared among domains.” (Bousmalis, p. 3, § 3 Method)

generating a domain-dependent representation of the input data sample; - Bousmalis teaches this limitation. Bousmalis’s DSN discloses private/domain-specific representations extracted by private encoders: 
“A private encoder                         
                            
                                    E
                                
                                    p
                                
                            (
                            x
                            )
                        
                     (one for each domain) learns to capture domain–specific components of the representation.” (Bousmalis, p. 3, § 3 Method)

configuring a decoder to ensure that a combination of the domain-independent representation and the domain-dependent representation contains sufficient information to reconstruct the input data sample; - Bousmalis teaches this limitation. Bousmalis’s DSN discloses a shared decoder that reconstructs the input using both shared and private representations:
“A shared decoder learns to reconstruct the input sample by using both the private and source representations.” (Bousmalis, p. 3, § 3 Method)

DSN further expresses reconstruction as:
“…                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            (
                            
                                    E
                                
                                    c
                                
                                    x
                                
                            +
                            
                                    E
                                
                                    p
                                
                                    x
                                
                            )
                        
                     …” (Bousmalis, p. 3, § 3.1 Learning)

configuring a discriminator to attempt to determine an originating domain of the domain- independent representation; - Bousmalis teaches this limitation. Bousmalis’s DSN includes a domain classifier/discriminator operating on the shared representation via the DANN-style similarity term, i.e., a domain-prediction objective on the shared features: 
“                        
                            Z
                            (
                            Q
                            
                                            h
                                        
                                            c
                                        
                            ;
                             
                                    θ
                                
                                    z
                                
                            →
                            
                                    d
                                
                                ^
                            
                     parameterized by                         
                            
                                    θ
                                
                                    z
                                
                     maps a shared representation vector                         
                            
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            ;
                             
                                    θ
                                
                                    c
                                
                            )
                        
                     to a prediction of the label                         
                            
                                    d
                                
                                ^
                            
                            ∈
                            
                                    0
                                    ,
                                     
                                    1
                                
                     of the input sample x.” (Bousmalis, p. 4, § 3.2 Similarity Losses)

DSN further ties this to originating domain (source vs. target) by explaining what the discriminator is trained to do:
“…                         
                            
                                    θ
                                
                                    z
                                
                     is optimized to increase Z’s ability to discriminate between encodings of images from the source or target domains, …” (Bousmalis, p. 4, § 3.2 Similarity Losses)

And DSN provides the discriminator loss using the ground-truth domain label (i.e., the originating domain label):
“                        
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                                    D
                                    A
                                    N
                                    N
                                
                            =
                            
                                    ∑
                                    
                                        i
                                        =
                                        0
                                    
                                                N
                                            
                                                S
                                            
                                        +
                                        
                                                N
                                            
                                                t
                                            
                                                    d
                                                
                                                    i
                                                
                                            l
                                            o
                                            g
                                            
                                                            d
                                                        
                                                        ^
                                                    
                                                    i
                                                
                                            +
                                            
                                                    1
                                                    -
                                                    
                                                            d
                                                        
                                                            i
                                                        
                                            l
                                            o
                                            g
                                            ⁡
                                            (
                                            1
                                            -
                                            
                                                            d
                                                        
                                                        ^
                                                    
                                                    i
                                                
                                            )
                                        
                     … where                         
                            
                                    d
                                
                                    i
                                
                            ∈
                            
                                    0
                                    ,
                                     
                                    1
                                
                     is the ground truth domain label for sample                         
                            i
                        
                    .” (Bousmalis, p. 5, § 3.2 Similarity Losses)

Bousmalis’s DSN teaches configuring a discriminator/domain classifier to determine an originating domain of the domain-independent (shared) representation, because DSN discloses that                         
                            Z
                            (
                            Q
                            
                                            h
                                        
                                            c
                                        
                            ;
                             
                                    θ
                                
                                    z
                                
                            )
                        
                     ‘maps to shared representation vector                         
                            
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            ;
                             
                                    θ
                                
                                    c
                                
                            )
                        
                     to a prediction’ of the domain label                         
                            
                                    d
                                
                                ^
                            
                            ∈
                            
                                    0
                                    ,
                                     
                                    1
                                
                    , and optimizes                         
                            
                                    θ
                                
                                    z
                                
                     to discriminate source vs. target domains. 

configuring a classifier to classify the input data sample based on the domain-independent representation of the input data sample; - Bousmalis teaches this limitation. Bousmalis’s DSN explicitly uses the shared representation for task classification: 
“…                         
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            (
                            
                                    E
                                
                                    c
                                
                                    x
                                
                            )
                        
                     …” (Bousmalis, p. 3, § 3.1 Learning)

And DSN states the goal: 
“… our goal is to train a classifier on data from the source domain that generalizes to the target domain.” (Bousmalis, p. 2-3, § 3 Method)

and configuring a generator to generate the domain-independent representation of the input data sample such that it fools the discriminator, enables the classifier to classify the input data sample, and enables reconstruction of the input sample from the domain-independent representation and the domain-dependent representation – Bousmalis teaches this limitation. Bousmalis’s DSN trains the shared encoder output (the claimed “domain-independent representation”) under (i) a task/classifier loss, (ii) a reconstruction loss through the decoder from shared + private, and (ii) an adversarial domain/discriminator objective:
Enables the classifier to classify (task/classifier loss on shared representation): DSN defines task prediction from the shared representation and trains via a task loss:
“                        
                            
                                    y
                                
                                ^
                            
                            =
                            G
                            (
                            
                                    E
                                
                                    c
                                
                                    x
                                
                            )
                        
                     …                         
                            
                                    y
                                
                                ^
                            
                     is the task-specific prediction.” (Bousmalis, p. 3, § 3.1 Learning)

“We want to minimize the negative log-likelihood of the ground truth class …” (Bousmalis, p. 3, § 3.1 Learning)

And defines                         
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                     with:
“                        
                            
                                            y
                                        
                                        ^
                                    
                                    i
                                
                                    s
                                
                            =
                            G
                            (
                            
                                    E
                                
                                    c
                                
                                            x
                                        
                                            i
                                        
                                            s
                                        
                            )
                        
                    .” (Bousmalis, p. 4, § 3.1 Learning)
Enables reconstruction from DIRep + DDRep (reconstruction through decoder + reconstruction loss):
DSN reconstructs form the combination of shared and private representations: 
“Inference in a DSN model is given by                         
                            
                                    x
                                
                                ^
                            
                            =
                            D
                            
                                            E
                                        
                                            c
                                        
                                            x
                                        
                                    +
                                     
                                            E
                                        
                                            p
                                        
                                            x
                                        
                     … where                         
                            
                                    x
                                
                                ^
                            
                     is the reconstruction of the input                         
                            x
                        
                    ” (Bousmalis, p. 3, § 3.1 Learning)

And DSN defines the reconstruction loss                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                    : 
“… for the reconstruction loss                         
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                    ” (Bousmalis, p. 4, § 3.1 Learning)
Fools the discriminator (adversarial domain objective on shared features):
DSN discloses a domain classifier                         
                            Z
                        
                     that maps the shared representation                         
                            
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            )
                        
                     to a domain prediction                         
                            
                                    d
                                
                                ^
                            
                    , and explains adversarial training such that the shared encoder is trained to reduce domain discrimination performance while                         
                            Z
                        
                     is trained to improve it: 
“                        
                            Z
                            (
                            Q
                            
                                            h
                                        
                                            c
                                        
                            ;
                             
                                    θ
                                
                                    z
                                
                            →
                            
                                    d
                                
                                ^
                            
                     parameterized by                         
                            
                                    θ
                                
                                    z
                                
                     maps a shared representation vector                         
                            
                                    h
                                
                                    c
                                
                            =
                            
                                    E
                                
                                    c
                                
                            (
                            x
                            ;
                             
                                    θ
                                
                                    c
                                
                            )
                        
                     to a prediction of the label                         
                            
                                    d
                                
                                ^
                            
                            ∈
                            
                                    0
                                    ,
                                     
                                    1
                                
                     of the input sample x.” (Bousmalis, p. 4, § 3.2 Similarity Losses)
Consistent with the above, Bousmalis’s DSN explicitly trains by minimizing a combine objective that includes both the task loss and reconstruction loss (and a similarity/domain term): 
“The goal of training is to minimize …                         
                            L
                            =
                            
                                    L
                                
                                    t
                                    a
                                    s
                                    k
                                
                            +
                            α
                            
                                    L
                                
                                    r
                                    e
                                    c
                                    o
                                    n
                                
                            +
                             
                            .
                            .
                            .
                             
                            +
                            γ
                            
                                    L
                                
                                    s
                                    i
                                    m
                                    i
                                    l
                                    a
                                    r
                                    i
                                    t
                                    y
                                
                    …“ (Bousmalis, p. 3-4, § 3.1 Learning)

Bousmalis does not teach these limitations: 
one or more tangible computer-readable storage media and program instructions stored on at least one of the one or more tangible computer-readable storage media, the program instructions executable by a processor, the program instructions comprising: 

and wherein the domain-dependent representation is constrained to have low information content. 

Matthey-de-l’Endroit, however, teaches these limitations: 
one or more tangible computer-readable storage media and program instructions stored on at least one of the one or more tangible computer-readable storage media, the program instructions executable by a processor, the program instructions comprising: - Matthey-de-l’Endroit teaches this limitation. Matthey-de-l’Endroit expressly describes implementation as computer program instructions on a tangible program carrier (i.e., non-transitory storage media): 
“can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus.” (Matthey-de-l’Endroit, col. 6-7, lines 64-1)

and wherein the domain-dependent representation is constrained to have low information content. – Matthey-de-l’Endroit teaches this limitation. Matthey-de-l’Endroit teaches constraining a latent/code to reduce capacity / low-information content by adding a KL/independence regularizer in the loss, including: 
loss depends on reconstruction quality and independence/KL term: 
“… determining a gradient of a loss function … wherein the loss function depends on a quality of the reconstruction … and also on a degree of independence between the latent factors …” (Matthey-de-l’Endroit, col. 9, lines 57-63)
training “to reduce the effective capacity of the latent bottleneck …” (Matthey-de-l’Endroit, col. 3, lines 46-47)

A POSITA starting from Bousmalis’s DSN (which already separates shared/domain-invariant and private/domain-specific representations and trains with reconstruction + classification + domain-adversarial objectives) would have been motivated to incorporate Matthey-de-l’Endroit’s known latent bottleneck / KL-capacity regularization to explicitly constrain the domain-dependent/private code to be low-information, because that is a predictable way to (i) reduce over-capacity/overfitting in the private channel and (ii) push common/predictive information into the shared channel, improving domain-invariant classification; consistent with DSN’s stated goal of learning representations that generalize across domains.

Regarding claims 17-20
Claims 17-20 are the method-claim analogs of the previously analyzed apparatus dependent claims: 
17 [Wingdings font/0xDF][Wingdings font/0xE0] 1 (base architecture/functional system :DIRep + DDRep, decoder reconstruction, discriminator prediction on DIRep, classifier on DIRep, generator trained to fool discriminator + enable classification + enable reconstruction, with DDRep constrained low-information via Matthey-de-l’Endroit)
18 [Wingdings font/0xDF][Wingdings font/0xE0] 2 (DDRep constrained to have low information content relative to DIRep)
19 [Wingdings font/0xDF][Wingdings font/0xE0] 4 (encoder penalized based on information content of DDRep such that information is shifted into DIRep and reduced in DDRep)
20 [Wingdings font/0xDF][Wingdings font/0xE0] 5 (DDRep content constrained to depend only on an identifier of the originating domain)
In each of claims 17-20, the claim language merely recasts the same subject matter previously addressed for the apparatus claim set, now in method form (e.g., “generating… configuring…”, rather than “an apparatus configured to…”).
Because claims 17-20 merely present, in method form, the same subject matter already found in the corresponding apparatus claims 1/2/4/5, and because Bousmalis in view of Matthey-de-l’Endroit teaches or renders obvious the recited method steps and added constraints for the same reasons previously set forth, claims 17-20 are unpatentable under 35 U.S.C. § 103 over Bousmalis in view of Matthey-de-l’Endroit.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Bousmalis in view of Matthey-de-l’Endroit and in further view of Diederik P. Kingma et al. (Auto-Encoding Variational Bayes (arXiv:1312.6114v11)). 

Regarding claim 11, Bousmalis in view of Matthey-de-l’Endroit and in further view of Kingma teach the apparatus of claim 1, wherein 
a Kullback-Leibler Divergence loss for the encoder and the data-dependent representation is:                         
                            
                                    L
                                
                                    k
                                    l
                                
                            =
                            
                                    D
                                
                                    K
                                    L
                                
                                    P
                                    r
                                    (
                                    E
                                    
                                            x
                                        
                            ∥
                            N
                            (
                            0
                            ,
                             
                            Ι
                            )
                            )
                        
                    . – Bousmalis does not teach this limitation. Kingma, however, teaches this limitation. Kingma explicitly teaches an objective including a KL-divergence term of the approximate posterior produced by an encoder/recognition model relative to a prior: 
“Often, the KL-divergence                 
                    
                            D
                        
                            K
                            L
                        
                    (
                    
                            q
                        
                            ∅
                        
                    (
                    z
                    |
                    
                            x
                        
                                    i
                                
                    )
                    ∥
                    
                            p
                        
                            θ
                        
                            z
                        
                    )
                
             …” (Kingma, p. 4, § 2.3 The SGVB estimator and AEVB algorithm)

And teaches the estimator including: 
“… the generic estimator:                 
                    …
                     
                    -
                    
                            D
                        
                            K
                            L
                        
                    (
                    
                            q
                        
                            ∅
                        
                    (
                    z
                    |
                    
                            x
                        
                                    i
                                
                    )
                    ∥
                    
                            p
                        
                            θ
                        
                            z
                        
                    )
                
             + …” (Kingma, p. 4, § 2.3 The SGVB estimator and AEVB algorithm)

Thus, Kingma’s AEVG algorithm teaches using a KL-divergence term as a loss/regularizer component.

Additionally, Kingma’s AEVB algorithm expressly specifies the prior as a centered isotropic Gaussian: 
“Let the prior over the latent variables be the centered isotropic multivariate Gaussian                 
                    
                            p
                        
                            θ
                        
                            z
                        
                    =
                    N
                    (
                    z
                    ;
                    0
                    ,
                     
                    Ι
                    )
                
            .” (Kingma, p. 5, § 3 Example: Variational Auto-Encoder)

Kingma’s AEVB teaches the encoder/recognition model outputting a distribution                 
                    
                            q
                        
                            ϕ
                        
                    (
                    z
                    |
                    x
                    )
                
             (Gaussian with mean and diagonal variance produced by the encoder network):
“we use a neural network for the probabilistic encoder                 
                    
                            q
                        
                            ϕ
                        
                    (
                    z
                    |
                    x
                    )
                
             … assume …                 
                    l
                    o
                    g
                    
                            q
                        
                            ϕ
                        
                            z
                        
                                    x
                                
                                            i
                                        
                    =
                    l
                    o
                    g
                    N
                    (
                    z
                    ;
                    
                            μ
                        
                                    i
                                
                    ,
                     
                            σ
                        
                            2
                            
                                    i
                                
                    Ι
                    )
                
            ” (Kingma, p. 5, § 3 Example: Variational Auto-Encoder)

Kingma’s AEVB provides the closed-form (                
                    
                            D
                        
                            K
                            L
                        
                    (
                    ∙
                    ∥
                    N
                    
                            0
                            ,
                             
                            Ι
                        
                    )
                    )
                
             KL-related term for the Gaussian case in its estimator:
“
    PNG
    media_image5.png
    115
    593
    media_image5.png
    Greyscale
 … ” (Kingma, p. 5, § 3 Example: Variational Auto-Encoder)

Matthey-de-l’Endroit’s VAE-style bottleneck/regularization motivation (already used in the claim 1 combination) makes it natural to implement the “low-information” constraint using the standard, well-established VAE KL regularizer taught by Kingma, namely, a KL divergence from the encoder-produced latent distribution to a unit Gaussian prior, because Kingma provides the canonical, routine formulation and training objective for that exact constraint. It would have been obvious to apply that standard VAE KL regularizer to the DSN domain-dependent (private) representation in the combined system.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Paul Coleman whose telephone number is (571)272-4687. The examiner can normally be reached Mon-Fri.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Yi can be reached at (571) 270-7519. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/PAUL COLEMAN/               Examiner, Art Unit 2126                                    

/DAVID YI/               Supervisory Patent Examiner, Art Unit 2126
Read full office action
Prosecution Timeline

Mar 31, 2023
Application Filed
Jan 14, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/036,312
Patent 12597489
METHOD, DEVICE, AND COMPUTER PROGRAM FOR PREDICTING INTERACTION BETWEEN COMPOUND AND PROTEIN
2y 5m to grant Granted Apr 07, 2026
17/662,696
Patent 12574861
METHOD AND SYSTEM FOR ACCELERATING DISTRIBUTED PRINCIPAL COMPONENTS WITH NOISY CHANNELS
2y 5m to grant Granted Mar 10, 2026
17/551,708
Patent 12443678
STEPWISE UNCERTAINTY-AWARE OFFLINE REINFORCEMENT LEARNING UNDER CONSTRAINTS
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 3 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds
Prosecution Projections

1-2
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+42.9%)
3y 6m
Median Time to Grant
Low
PTA Risk
Based on 10 resolved cases by this examiner. Grant probability derived from career allow rate.
AUTOENCODER WITH GENERATIVE ADVERSARIAL NETWORKS FOR TRANSFER LEARNING BETWEEN DOMAINS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

AUTOENCODER WITH GENERATIVE ADVERSARIAL NETWORKS FOR TRANSFER LEARNING BETWEEN DOMAINS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email