Last updated: April 19, 2026
Application No. 17/405,859
PRIOR ADJUSTED VARIATIONAL AUTOENCODER

Non-Final OA §101§103
Filed
Aug 18, 2021
Examiner
KIM, SEHWAN
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
SAP SE
OA Round
5 (Non-Final)
Interview Optional

— +65.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 144 resolved cases, 2023–2026
Examiner Intelligence

KIM, SEHWAN View full profile →
Grants 60% of resolved cases
Career Allow Rate
86 granted / 144 resolved
+4.7% vs TC avg
Strong +66% interview lift
Without
With
+65.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
35 currently pending
Career history
179
Total Applications
across all art units
Statute-Specific Performance

§101
20.8%
-19.2% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
6.3%
-33.7% vs TC avg
§112
23.3%
-16.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 144 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/26/2026 has been entered.

Examiner’s Note
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 101 and § 103 for moving toward allowance.
Providing supporting paragraph(s) for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.

Priority
Acknowledgment is made of applicant's claim for the present application filed on 08/18/2021.

Response to Arguments
Applicant's arguments filed on 02/26/2026 have been fully considered but they are not persuasive.
In Remarks, pp. 8-12, Applicant contends: 
The improvement comprises an increase in accuracy of the reconstructed data generated by the decoder of the VAE (i.e., lower loss in the reconstructed data compared to the input data). 
…
If the normal Gaussian distribution N(0,1) is enforced, the VAE might neglect the reconstruction of this representation towards the original input, leading to "blurry reconstructions and to representations that are difficult to interpret, as they follow the normal Gaussian distribution N(0,1) more than the underlying data itself."
…
"thus allowing the data to be represented in a more meaningful way and to have a lower loss in the representation of the data due to the individual elements being closer to a target distribution N(mu g, sigma g)."
…
Pursuant to MPEP §2106.04(a)(2)(III)(A), the human mind cannot practically, for instance, generate a latent space, sample a latent variable from the latent space, or generate reconstructed data based on the sampled variable, nor is the human mind equipped to do so.
…
the above limitations do not recite such specific calculations, but rather recite mathematical concepts at a high level such that the claims merely involve but do not recite mathematical concepts.
…
Importantly, these additional limitations reflect the improvement to the technical field of V AEs. As noted above, the specification explains that the claimed invention "allow[s] the data to be represented in a more meaningful way and to have a lower loss in the representation of the data due to the individual elements being closer to a target distribution N(mu_g, sigma_g)."

Examiner’s response:
The examiner understands the applicant’s assertion. 
However, it appears that each processing step is just applying the abstract idea to a general field of endeavor with additional elements. In addition, improvements to technology or technical field are not necessarily reflected in the claims. Thus, the claim does not integrate the judicial exception into a practical application, and the claim does not amount to significantly more than the judicial exception.

The examiner understands the applicant’s assertion “blurry reconstructions and to representations that are difficult to interpret, as they follow the normal Gaussian distribution N(0,1) more than the underlying data itself” and “The improvement comprises an increase in accuracy of the reconstructed data generated by the decoder of the VAE (i.e., lower loss in the reconstructed data compared to the input data)” and “allow[s] the data to be represented in a more meaningful way and to have a lower loss in the representation of the data due to the individual elements being closer to a target distribution N(mu_g, sigma_g)”.
However, increasing accuracy and providing more meaningful data representation do not always improve the functioning of a computer or improve another technology or technical field. Rather, it appears that increasing accuracy and providing more meaningful data representation just improve the abstract ideas of the independent claims. Even assuming, arguendo, they are considered improvements, still it is not clear how the claims reflect the alleged improvements.

The examiner understands the applicant’s assertion “Pursuant to MPEP §2106.04(a)(2)(III)(A), the human mind cannot practically, for instance, generate a latent space, sample a latent variable from the latent space, or generate reconstructed data based on the sampled variable, nor is the human mind equipped to do so” and “the above limitations do not recite such specific calculations, but rather recite mathematical concepts at a high level such that the claims merely involve but do not recite mathematical concepts.”
However, the “training” step just describes what the encoder does, but it does not describe how the encoder is trained. In other words, the encoder is trained to find representations for data elements of an input dataset such that the encoder learns to separate attributes of the data elements into group-related attributes and group-unrelated attributes based on respective Gaussian distributions, but it is not clear how the encoder is trained in detail toward improvements. Moreover, the “training” step does not train the whole variational autoencoder. In other words, the decoder is not even trained but it is used in the last limitation.
In addition, as rejected under Claim Rejections - 35 USC § 101, “generating … representations for the data elements”, “calculating .. a first loss function … a second loss function”, “generating … a latent space based on a minimization of the first loss function and the second loss function”, “sampling a latent variable” and “generating … reconstructed data” including the last limitation have been considered mathematical calculations in light of the specification. Basically, based on mathematical calculations, the encoder converts data elements into a latent space (e.g., based on Gaussian distributions with different attributes and a loss function minimization) and the decoder generates a reconstructed data from a latent variable in the latent space by combing the group-related attributes and the group-unrelated attributes. Note that pars 34-60 along with figs 3-4 and equations (1)-(8) support the mathematical calculations for the operations of the encoder and the decoder.

Furthermore, as rejected under 35 USC § 101, the limitations do not clearly show e.g., improvements in computer technology and improvements to other technical fields. It doesn’t appear that the specification and/or the independent claims clearly show how the inventive concept of the claims enables improvements and how they are tied together. The applicant may need to state a specific improvement from the specification, and/or may need to amend the claims to show how the claim languages and improvements are tied together. 
To find a valid improvement to a technology, MPEP 2106.04(d)(1) says the specification must explain the improvement and that the claim must reflect the disclosed improvement. Furthermore, the improvement should not be merely a consequence of the abstract idea. See MPEP 2106.05(a). An improvement in the abstract idea itself is not an improvement to technology.
For at least these reasons, Applicant's arguments are not convincing.

The Examiner encourages Applicant to schedule an interview to discuss issues related to 35 U.S.C § 101, for moving the prosecution forward.

Applicant’s arguments regarding 35 USC § 103 with respect to the independent claims have been considered but are moot because the arguments are directed to amended limitation(s) that has/have not been previously examined.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 3-4, 7-9, 11-12, 15-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of a machine.
Step 2A Prong 1: 
The limitations of 
“…, comprising:
… comprising:
…;
…; 
…;
generating, … and based on the training, representations for the data elements such that the data elements are separated into first data elements and second data elements, the first data elements comprising the group-related attributes characterized by a group-specific Gaussian distribution that is defined by a group mean and a group variance, and the second data elements comprising the group-unrelated attributes characterized by a normal Gaussian distribution that is defined by a zero mean and a variance of one;
calculating, …, a first loss function for the group-related attributes based on a difference between the first data elements and the group-specific Gaussian distribution and a second loss function for the group-unrelated attributes based on a difference between the second data elements and the normal Gaussian distributions;
generating, …, a latent space based on a minimization of the first loss function and the second loss function, the latent space including a first latent space in which the first data elements are encoded and a second latent space in which the second data elements are encoded;
sampling a latent variable from the latent space generated …; and
generating, …, reconstructed data based on the latent variable, the reconstructed data characterizing a reconstruction of the dataset, 
wherein, in the reconstructed data, the group-related attributes are combined with the group-unrelated attributes such that the group-related attributes are represented together as one or more groups according to the group-specific Gaussian distribution, while the group-unrelated attributes are represented as group independent according to the normal Gaussian distribution”, as drafted, is a machine that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. 

The limitations of 
“creating a training dataset comprising text, images, audio, video, or any combination thereof” as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper).

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“at least one processor; and at least one memory storing instructions which when executed by the at least one processor causes operations”, “the encoder”, “by the encoder”, “by a decoder of the variational autoencoder”) – using a device and a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“training, using the training dataset, an encoder of a variational autoencoder comprising a neural network, wherein the encoder is trained to find representations for input data elements of an input dataset such that the encoder learns to separate attributes of the input data elements into group-related attributes and group-unrelated attributes based on respective Gaussian distributions”). The additional element is recited at such a high level without any details as to how a model is trained such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“receiving, as input to the encoder, a data batch of a dataset”) – the act of receiving/obtaining data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of receiving/obtaining data is recited at a high-level of generality (i.e., as a generic act of receiving performing a generic act function of receiving data) such that it amounts no more than a mere act to apply the exception using a generic act of receiving/obtaining. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
In particular, the claim recites an additional element (“the data batch including data elements comprising text, images, audio”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. See MPEP 2106.05(f).
The additional elements regarding training are recited at such a high level without any details as to how a model is trained such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not amount to significantly more than the abstract idea. The claim is directed to an abstract idea.
As discussed above, the claim recites the additional element(s) of receiving data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 3
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of a machine.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element (“wherein the first latent space comprises the group-related attributes, the group-related attributes relating to content attributes of the dataset and sharing one or more group characteristics, and wherein the second latent space comprises the group-unrelated attributes, the group-unrelated attributes relating to style attributes of the dataset”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 4
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of a machine.
Step 2A Prong 1: 
The limitations of 
“further comprising encoding, …, the first data elements in the first latent space”, as drafted, is a machine that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“by the encoder”) – using a model to process data. The model in each step is recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. See MPEP 2106.05(f).

Regarding claim 7
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of a machine.
Step 2A Prong 1: 
The limitations of 
“wherein the group specific Gaussian distribution is determined based on group-level supervision”, as drafted, is a machine that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. 

If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.

Regarding claim 8
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Step 1: The claim recites a system; therefore, it falls into the statutory category of a machine.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. 
In particular, the claim recites an additional element (“wherein the neural network of the encoder is a first neural network, wherein the decoder comprises a second neural network”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h) 

Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).

Regarding claim 9
The claim is rejected for the reasons set forth in the rejection of claim 1 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 11
The claim is rejected for the reasons set forth in the rejection of claim 3 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 12
The claim is rejected for the reasons set forth in the rejection of claim 4 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 15
The claim is rejected for the reasons set forth in the rejection of claim 7 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 16
The claim is rejected for the reasons set forth in the rejection of claim 8 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 17
The claim recites “A non-transitory computer-readable storage medium including program code, which when executed by at least one data processor, causes operations comprising:” to perform precisely the method of Claim 1. As performance of an abstract idea on generic computer components (see MPEP 2106.05(f)) cannot integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself, the claim is rejected for reasons set forth in the rejection of Claim 1.

Regarding claim 18
The claim is rejected for the reasons set forth in the rejection of claim 3 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Regarding claim 19
The claim is rejected for the reasons set forth in the rejection of claim 4 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 3-4, 7-9, 11-12, 15-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Bouchacourt et al. (Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations) in view of Jha et al. (US 2021/0224610 A1)

Regarding claim 1
Bouchacourt teaches
A system, comprising:
(Bouchacourt [sec(s) 1] “• We propose the ML-VAE model to learn disentangled representations from group level supervision; • we extend amortized inference to the case of non-iid observations; • we demonstrate experimentally that the ML-VAE model learns a semantically meaningful disentanglement of grouped data; • we demonstrate manipulation of the latent representation and generalises to unseen groups.”;)

creating a training dataset comprising text, images, audio, video, or any combination thereof;
(Bouchacourt [sec(s) 4] “We evaluate the ML-VAE on MNIST Lecun et al. [1998]. We consider the data grouped by digit label, i.e. the content latent code C should encode the digit label. We randomly separate the 60, 000 training examples into 50, 000 training samples and 10, 000 validation samples, and use the standard MNIST testing set. For both the encoder and decoder, we use a simple architecture of 2 linear layers (detailed in the supplementary material). … The dataset was constructed by retrieving approximately 100 images per celebrity from popular search engines, and noise has not been removed from the dataset. For each query, we consider the top ten results (note there was multiple queries per celebrity, therefore some identities have more than 10 images). This creates a dataset of 98, 880 entities for a total of 811, 792 images, and we group the data by identity. Importantly, we randomly separate the dataset in disjoints sets of identities as the training, validation and testing datasets.”;)

training, using the training dataset, an encoder of a variational autoencoder comprising a neural network, wherein the encoder is trained to find representations for input data elements of an input dataset such that the encoder learns to separate attributes of the input data elements into group-related attributes and group-unrelated attributes based on respective Gaussian distributions;
(Bouchacourt [fig(s) 1] “In (b) and (c) upper part of the latent code is color, lower part is shape.” [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi, i ∈ G, 
    PNG
    media_image2.png
    111
    1105
    media_image2.png
    Greyscale
, (6) where we assume q(CG|Xi = xi;φc) to be a Normal distribution N(µi, Σi). … We also assume a Normal distribution for q(Si |Xi; φs), i ∈ G.” [sec(s) 4] “MNIST dataset. We evaluate the ML-VAE on MNIST Lecun et al. [1998]. We consider the data grouped by digit label, i.e. the content latent code C should encode the digit label. We randomly separate the 60, 000 training examples into 50, 000 training samples and 10, 000 validation samples, and use the standard MNIST testing set. … We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder. … Similarly as the MNIST experiment we construct the latent representation and sample it as explained in Algorithm 1 in the main paper.”; e.g., “separates the latent representation into two parts, style and content, and samples in the same group have the same content” read(s) on “separate attributes of the input data elements into group-related attributes and group-unrelated attributes based on respective Gaussian distributions”.
Examiner notes that paragraph 6 of the Instant Specification describes “The first part of the latent space may include content attributes of the dataset, the content attributes sharing one or more group characteristics, where the second part of the latent space includes style attributes of the dataset.”)

receiving, as input to the encoder, a data batch of a dataset, the data batch including data elements comprising text, images, audio, video, or any combination thereof;
(Bouchacourt [fig(s) 1] “In (b) and (c) upper part of the latent code is color, lower part is shape.” [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code.” See also [sec(s) 3.3] [sec(s) 4] “We consider the data grouped by digit label, i.e. the content latent code C should encode the digit label. We randomly separate the 60, 000 training examples into 50, 000 training samples and 10, 000 validation samples, and use the standard MNIST testing set. … We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.”;)

generating, by the encoder and based on the training, representations for the data elements such that the data elements are separated into first data elements and second data elements, the first data elements comprising the group-related attributes characterized by a group-specific Gaussian distribution that is defined by a group mean and a group variance, and the second data elements comprising the group-unrelated attributes characterized by a normal Gaussian distribution that is defined by [a zero mean and a variance of one];
(Bouchacourt [fig(s) 1] “In (b) and (c) upper part of the latent code is color, lower part is shape.” [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 3.2] “We now assume that the observations are organised in a set G of distinct groups, with a factor of variation that is shared among all observations within a group” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi, i ∈ G, 
    PNG
    media_image2.png
    111
    1105
    media_image2.png
    Greyscale
, (6) where we assume q(CG|Xi = xi;φc) to be a Normal distribution N(µi, Σi). … We also assume a Normal distribution for q(Si |Xi; φs), i ∈ G.” [sec(s) 4] “MNIST dataset. We evaluate the ML-VAE on MNIST Lecun et al. [1998]. We consider the data grouped by digit label, i.e. the content latent code C should encode the digit label. We randomly separate the 60, 000 training examples into 50, 000 training samples and 10, 000 validation samples, and use the standard MNIST testing set. … We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder. … Similarly as the MNIST experiment we construct the latent representation and sample it as explained in Algorithm 1 in the main paper.”; e.g., “observations are organised in a set G of distinct groups” along with “separates the latent representation into two parts, style and content, and samples in the same group have the same content” read(s) on “separated into first data elements and second data elements”.
Examiner notes that paragraph 6 of the Instant Specification describes “The first part of the latent space may include content attributes of the dataset, the content attributes sharing one or more group characteristics, where the second part of the latent space includes style attributes of the dataset.”)

calculating, by the encoder, a first loss function for the group-related attributes based on a difference between the first data elements and the group-specific Gaussian distribution and a second loss function for the group-unrelated attributes based on a difference between the second data elements and the normal Gaussian distributions;
(Bouchacourt [fig(s) 1] [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi , i ∈ G, 
    PNG
    media_image3.png
    77
    770
    media_image3.png
    Greyscale
 (6) where we assume q(CG|Xi = xi ; φc) to be a Normal distribution N(µi , Σi). Murphy [2007] shows that the product of two Gaussians is a Gaussian. Similarly, in the supplementary material we show that q(CG = c|XG = xG; φc) is the density function of a Normal distribution of mean µG and variance ΣG. … Since the resulting distribution is a Normal distribution, the term KL(q(CG|XG; φc)||p(CG)) can be evaluated in closed-form. We also assume a Normal distribution for q(Si |Xi; φs), i ∈ G.” [sec(s) 4] “We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder. … Similarly as the MNIST experiment we construct the latent representation and sample it as explained in Algorithm 1 in the main paper.” [sec(s) 3.2] “We denote by XG the observations corresponding to the group G. We explicitly model each Xi in XG to have its independent latent representation for the style Si, and SG = (Si, i ∈ G). CG is a unique latent variable shared among the group for the content. … 
    PNG
    media_image4.png
    230
    1323
    media_image4.png
    Greyscale
”; e.g., eq (3) read(s) on “a first loss function for the group-related attributes based on a difference between the first data elements and the group-specific Gaussian distribution and a second loss function for the group-unrelated attributes based on a difference between the second data elements and the normal Gaussian distributions” since KL (Kullback-Leibler) divergence represents a difference between its two elements.)

generating, by the encoder, a latent space based on a minimization of the first loss function and the second loss function, the latent space including a first latent space in which the first data elements are encoded and a second latent space in which the second data elements are encoded;
(Bouchacourt [fig(s) 1] [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi , i ∈ G, 
    PNG
    media_image3.png
    77
    770
    media_image3.png
    Greyscale
 (6) where we assume q(CG|Xi = xi ; φc) to be a Normal distribution N(µi , Σi). Murphy [2007] shows that the product of two Gaussians is a Gaussian. Similarly, in the supplementary material we show that q(CG = c|XG = xG; φc) is the density function of a Normal distribution of mean µG and variance ΣG. … Since the resulting distribution is a Normal distribution, the term KL(q(CG|XG; φc)||p(CG)) can be evaluated in closed-form. We also assume a Normal distribution for q(Si |Xi; φs), i ∈ G.” [sec(s) 3.2] “We denote by XG the observations corresponding to the group G. We explicitly model each Xi in XG to have its independent latent representation for the style Si, and SG = (Si, i ∈ G). CG is a unique latent variable shared among the group for the content. … 
    PNG
    media_image4.png
    230
    1323
    media_image4.png
    Greyscale
 We define the average group ELBO over the dataset, 
    PNG
    media_image5.png
    140
    992
    media_image5.png
    Greyscale
 and we maximise 
    PNG
    media_image6.png
    53
    301
    media_image6.png
    Greyscale
.”; e.g., the optimization of eq (3) read(s) on “minimization of the first loss function and the second loss function”.)

sampling a latent variable from the latent space generated by the encoder; and
(Bouchacourt [fig(s) 1] “In (b) and (c) upper part of the latent code is color, lower part is shape.” [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 3.2] “we separate the latent representation in two latent variables Z = (C, S) with style S and content C” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi , i ∈ G.” [sec(s) 4] “We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder. … Similarly as the MNIST experiment we construct the latent representation and sample it as explained in Algorithm 1 in the main paper.”;)

generating, by a decoder of the variational autoencoder, reconstructed data based on the latent variable, the reconstructed data characterizing a reconstruction of the dataset.
(Bouchacourt [fig(s) 4] “Swapping, first row and first column are test data samples (green boxes), second row and column are reconstructed samples (blue boxes) and the rest are swapped reconstructed samples (red boxes).” [fig(s) 6] [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code.” [sec(s) 4] “We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images. … Figure 4 shows the swapping procedure, where the first row and the first column show the test data sample input to ML-VAE, second row and column are reconstructed samples. Each row is a fixed style and each column is a fixed content. We see that the ML-VAE disentangles the factors of variation of the data in a relevant manner. … In Figures 6a and 6b, we reconstruct images of the same group with and without taking into account the grouping information. We see that the ML-VAE handles cases where there is no group information at test-time, and benefits from accumulating evidence if available.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder.”;)

wherein, in the reconstructed data, the group-related attributes are combined with the group-unrelated attributes such that the group-related attributes are represented together as one or more groups according to the group-specific Gaussian distribution, while the group- unrelated attributes are represented as group independent according to the normal Gaussian distribution.
(Bouchacourt [fig(s) 4] “Swapping, first row and first column are test data samples (green boxes), second row and column are reconstructed samples (blue boxes) and the rest are swapped reconstructed samples (red boxes).” [fig(s) 6] [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code.” [sec(s) 4] “We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images. … Figure 4 shows the swapping procedure, where the first row and the first column show the test data sample input to ML-VAE, second row and column are reconstructed samples. Each row is a fixed style and each column is a fixed content. We see that the ML-VAE disentangles the factors of variation of the data in a relevant manner. … In Figures 6a and 6b, we reconstruct images of the same group with and without taking into account the grouping information. We see that the ML-VAE handles cases where there is no group information at test-time, and benefits from accumulating evidence if available.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder.” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi, i ∈ G, 
    PNG
    media_image2.png
    111
    1105
    media_image2.png
    Greyscale
, (6) where we assume q(CG|Xi = xi;φc) to be a Normal distribution N(µi, Σi). … We also assume a Normal distribution for q(Si |Xi; φs), i ∈ G.”;)

However, Bouchacourt does not appear to explicitly teach:
at least one processor; and
at least one memory storing instructions which when executed by the at least one processor causes operations comprising:
the second data elements comprising the group-unrelated attributes characterized by a normal Gaussian distribution that is defined by [a zero mean and a variance of one];

Jha teaches
at least one processor; and
(Jha [fig(s) 15] [par(s) 58] “FIG. 15 is a diagram illustrating hardware and software components of a computer system on which the system of the present disclosure could be implemented. The system includes a processing server 102 which could include a storage device 104, a network interface 118, a communications bus 110, a central processing unit (CPU) (microprocessor) 112, a random access memory (RAM) 114, and one or more input devices 116, such as a keyboard, mouse, etc. The server 102 could also include a display (e.g., liquid crystal display (LCD), cathode ray tube (CRT), etc.). The storage device 104 could comprise any suitable, computerreadable storage medium such as disk, non-volatile memory (e.g., read-only memory (ROM), erasable programmable ROM (EPROM), electrically-erasable progrannnable ROM (EEPROM), flash memory, field-progrannnable gate array (FPGA), etc.).”;)

at least one memory storing instructions which when executed by the at least one processor causes operations comprising:
(Jha [fig(s) 15] [par(s) 58] “FIG. 15 is a diagram illustrating hardware and software components of a computer system on which the system of the present disclosure could be implemented. The system includes a processing server 102 which could include a storage device 104, a network interface 118, a communications bus 110, a central processing unit (CPU) (microprocessor) 112, a random access memory (RAM) 114, and one or more input devices 116, such as a keyboard, mouse, etc. The server 102 could also include a display (e.g., liquid crystal display (LCD), cathode ray tube (CRT), etc.). The storage device 104 could comprise any suitable, computerreadable storage medium such as disk, non-volatile memory (e.g., read-only memory (ROM), erasable programmable ROM (EPROM), electrically-erasable progrannnable ROM (EEPROM), flash memory, field-progrannnable gate array (FPGA), etc.).”;)

the second data elements comprising the group-unrelated attributes characterized by a normal Gaussian distribution that is defined by a zero mean and a variance of one;                                                            
(Jha [fig(s) 10, 13] [par(s) 37-47] “The systems and methods of the present disclosure use a conditional variational auto-encoder based model, where the latent space is partitioned into two complementary subspaces. The first subspace is “s,” which controls specified factors of variation associated with the available supervision in the dataset. The second subspace is “z,” which models the remaining unspecified factors of variation. … The present systems and methods sample a point zi from the Gaussian prior p(z) = N(0,1) over the unspecified latent space. Specified latent variables s1= fs(x1) and s2=fs(x2) are also sampled. … The MNIST dataset includes of hand-written digits distributed among 10 classes. The specified factors in case of MNIST is the digit identity, while the unspecified factors control digit slant, stroke width etc.” [par(s) 54] “FIGS. 10(a)-(f) are image grids generated by combining specified factors of variation in one image and unspecified factors of variation in another image. In particular, the image grids are generated by swapping z and s variables. The top row and the first column are randomly selected from the test set. The remaining grid is generated by taking z from the digit in first column and s from the digit in first row. This keeps the unspecified factors constant in rows and the specified factors constant in columns.” [par(s) 56] “FIGS. 14a-14c shows the result of conditional image generation by sampling directly from the prior p(z), as well as image generation by conditioning on the s variable, taken from test images, and sampling the variable from N(0,1).”; e.g., “unspecified factors control digit slant, stroke width etc” along with “second subspace is “z,” which models the remaining unspecified factors of variation” read(s) on “group-unrelated attributes”.)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Bouchacourt with the normal Gaussian distribution for style attributes of Jha.
One of ordinary skill in the art would have been motived to combine in order to achieve compelling quantitative results on three different datasets and show good image generation capabilities as a generative model. In addition, a training strategy with a combination of adversarial and reverse cycle-consistency loss can improve the sharpness of the generated images while maintaining the disentangling capability of the encoder.
(Jha [par(s) 57] “Through the experimental evaluations, it has been shown that the present systems and methods achieve compelling quantitative results on three different datasets and show good image generation capabilities as a generative model. It should also be noted that the cycle-consistent VAE could be trained as the first step, followed by training the decoder with a combination of adversarial and reverse cycle-consistency loss. This training strategy can improve the sharpness of the generated images while maintaining the disentangling capability of the encoder.”)

Regarding claim 3
The combination of Bouchacourt, Jha teaches claim 1.

Bouchacourt further teaches 
wherein the first latent space comprises the group-related attributes, the group-related attributes relating to content attributes of the dataset and sharing one or more group characteristics, and wherein the second latent space comprises the group-unrelated attributes, the group-unrelated attributes relating to style attributes of the dataset.
(Bouchacourt [fig(s) 1] “In (b) and (c) upper part of the latent code is color, lower part is shape.” [algorithm(s) 1] [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 4] “We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.”; e.g., “samples in the same group have the same content” read(s) on “group-related attributes relating to content attributes of the dataset and sharing one or more group characteristics”.
Examiner notes that paragraph 6 of the Instant Specification describes “The first part of the latent space may include content attributes of the dataset, the content attributes sharing one or more group characteristics, where the second part of the latent space includes style attributes of the dataset.”)

Regarding claim 4
The combination of Bouchacourt, Jha teaches claim 3.

Bouchacourt further teaches
further comprising encoding, by the encoder, the first data elements in the first latent space.
(Bouchacourt [fig(s) 1] “In (b) and (c) upper part of the latent code is color, lower part is shape.” [algorithm(s) 1: ML-VAE training algorithm] “
    PNG
    media_image1.png
    372
    1509
    media_image1.png
    Greyscale
” [sec(s) 1] “In the VAE model, a network (the encoder) encodes an observation into its latent representation (or latent code) and a generative network (the decoder) decodes an observation from a latent code. The VAE model performs amortised inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters to learn. … To use group observations the ML-VAE uses a grouping operation that separates the latent representation into two parts, style and content, and samples in the same group have the same content. This in turns makes the encoder learn a semantically meaningful disentanglement.” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi , i ∈ G. … We also assume a Normal distribution for q(Si |Xi; φs), i ∈ G.” [sec(s) 4] “MNIST dataset. We evaluate the ML-VAE on MNIST Lecun et al. [1998]. We consider the data grouped by digit label, i.e. the content latent code C should encode the digit label. We randomly separate the 60, 000 training examples into 50, 000 training samples and 10, 000 validation samples, and use the standard MNIST testing set. … We qualitatively assess the relevance of the learned representation by performing operations on the latent space. First we perform swapping: we encode test images, draw a sample per image from its style and content latent representations, and swap the style between images.” [Supplementary Material, sec(s) 2.1] “As explained in step 9 of Algorithm 1 in the main paper, for each input xi we draw a sample cG,i ∼ q(CG|XG = xG; φc) for the content of the group G, and a sample si ∼ q(Si |Xi = xi ; φs) of the style latent representation. We concatenate (cG,i, si) into a 2 × d-dimensional vector that is fed to the decoder. … Similarly as the MNIST experiment we construct the latent representation and sample it as explained in Algorithm 1 in the main paper.”;)

Regarding claim 7
The combination of Bouchacourt, Jha teaches claim 1.

Bouchacourt further teaches 
wherein the group specific Gaussian distribution is determined based on group-level supervision.
(Bouchacourt [sec(s) 1] “We propose group-level supervision: observations are organised in groups, where within a group the observations share a common but unknown value for one of the factors of variation. … Group observations are a form of weak supervision that is inexpensive to collect. In the above shape example, we do not need to know the factor of variation that defines the grouping. … We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model that learns a disentangled representation of a set of grouped observations” [sec(s) 3.2] “the VAE model performs amortised variational inference, that is, the observations parametrise the posterior distribution of the latent code, and all observations share a single set of parameters φ.” [sec(s) 3.3] “Our idea is to build the variational approximation of the single group content variable, q(CG|XG; φc), from the encoding of the grouped observations XG. While any distribution could be employed, we focus on using a product of Normal density functions. Other possibilities, such as a mixture of density functions, are discussed in the supplementary material. We construct the probability density function of the latent variable CG taking the value c by multiplying |G| normal density functions, each of them evaluating the probability of CG = c given Xi = xi , i ∈ G.”;)

Regarding claim 8
The combination of Bouchacourt, Jha teaches claim 1.

Bouchacourt further teaches
wherein the neural network of the encoder is a first neural network, wherein the decoder comprises a second neural network.
(Bouchacourt [sec(s) 4] “The encoder and the decoder network architectures, composed of either convolutional or deconvolutional and linear layers, are detailed in the supplementary material. We resize the images to 64 × 64 pixels to fit the network architecture. … We learn to classify the test images with a neural network classifier composed of two linear layers of 256 hidden units each, once using S and once using C as input features” [sec(s) Supplementary Material. 2] “MNIST Lecun et al. [1998]. We use an encoder network composed of a first linear layer e0 that takes as input a 1 × 784-dimensional MNIST image xi, xi is a realisation of Xi. … The decoder network is composed of a first linear layer d0 that takes as input the 2 × d-dimensional vector (cG,i, si). Layer d0 has 500 hidden units and the hyperbolic tangent activation function. … MS-Celeb-1M Guo et al. [2016]. We use an encoder network composed of a four convolutional layers e1, e2, e3, e4 all of stride 2 and kernel size 4. They are composed of respectively 64, 128, 256and512 filters. All four layers are followed by Batch Normalisation and Rectified Linear Units (ReLU) activation functions. … The decoder network is composed of 3 deconvolutional layers d1, d2, d3 all of stride 2 and kernel size 4. They are composed of respectively 256, 128, 64 filters. All four layers are followed by Batch Normalisation and Rectified Linear Units (ReLU) activation functions.”;)

Regarding claim 9
The claim is a method claim corresponding to the system claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 11
The claim is a method claim corresponding to the system claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 12
The claim is a method claim corresponding to the system claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 15
The claim is a method claim corresponding to the system claim 7, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 16
The claim is a method claim corresponding to the system claim 8, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 17
The claim is a computer-readable storage medium claim corresponding to the system claim 1, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 18
The claim is a computer-readable storage medium claim corresponding to the system claim 3, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Regarding claim 19
The claim is a computer-readable storage medium claim corresponding to the system claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hilprecht et al. (US 2020/0193298 A1) teaches a VAE for images.
Akrami et al. (Robust Variational Autoencoder) teaches latent space for the MNIST dataset.
Peis et al. (UNSUPERVISED LEARNING OF GLOBAL FACTORS IN DEEP GENERATIVE MODELS) teaches GMM with latent space for MNIST.
Kingma et al. (Semi-supervised Learning with Deep Generative Models) teaches disentanglement of style from class.
Dilokthanakul et al. (DEEP UNSUPERVISED CLUSTERING WITH GAUSSIAN MIXTURE VARIATIONAL AUTOENCODERS) teaches style and class with MNIST.
Casale et al. (Gaussian Process Prior Variational Autoencoders) teaches a learnable prior for VAE.
Hamghalam et al. (Modality Completion via Gaussian Process Prior Variational Autoencoders for Multi-Modal Glioma Segmentation) teaches estimating patients with medical conditions using VAE by extending Casale et al. (Gaussian Process Prior Variational Autoencoders).
NOWOZIN et al. (US 20180338147 A1) teaches sampled parameter values for content and style factors.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/SEHWAN KIM/Examiner, Art Unit 2129                                                                                                                                                                                                        3/9/2026
Read full office action
Prosecution Timeline

Aug 18, 2021
Application Filed
Feb 08, 2025
Non-Final Rejection — §101, §103
Feb 21, 2025
Response Filed
May 14, 2025
Final Rejection — §101, §103
Jul 07, 2025
Applicant Interview (Telephonic)
Jul 08, 2025
Examiner Interview Summary
Jul 21, 2025
Request for Continued Examination
Jul 23, 2025
Response after Non-Final Action
Jul 27, 2025
Non-Final Rejection — §101, §103
Oct 24, 2025
Examiner Interview Summary
Oct 24, 2025
Applicant Interview (Telephonic)
Oct 27, 2025
Response Filed
Nov 24, 2025
Final Rejection — §101, §103
Feb 06, 2026
Applicant Interview (Telephonic)
Feb 06, 2026
Examiner Interview Summary
Feb 26, 2026
Request for Continued Examination
Mar 09, 2026
Response after Non-Final Action
Mar 10, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

15/360,454
Patent 12602595
SYSTEM AND METHOD OF USING A KNOWLEDGE REPRESENTATION FOR FEATURES IN A MACHINE LEARNING CLASSIFIER
2y 5m to grant Granted Apr 14, 2026
16/453,380
Patent 12602580
Dataset Dependent Low Rank Decomposition Of Neural Networks
2y 5m to grant Granted Apr 14, 2026
17/098,007
Patent 12602581
Systems and Methods for Out-of-Distribution Detection
2y 5m to grant Granted Apr 14, 2026
17/358,891
Patent 12602606
APPARATUSES, COMPUTER-IMPLEMENTED METHODS, AND COMPUTER PROGRAM PRODUCTS FOR IMPROVED GLOBAL QUBIT POSITIONING IN A QUANTUM COMPUTING ENVIRONMENT
2y 5m to grant Granted Apr 14, 2026
18/081,242
Patent 12541722
MACHINE LEARNING TECHNIQUES FOR VALIDATING AND MUTATING OUTPUTS FROM PREDICTIVE SYSTEMS
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+65.6%)
4y 1m
Median Time to Grant
High
PTA Risk
Based on 144 resolved cases by this examiner. Grant probability derived from career allow rate.
PRIOR ADJUSTED VARIATIONAL AUTOENCODER

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email