Last updated: April 19, 2026
Application No. 17/211,681
TRAINING A LATENT-VARIABLE GENERATIVE MODEL WITH A NOISE CONTRASTIVE PRIOR

Final Rejection §103
Filed
Mar 24, 2021
Examiner
TRAN, DAVID HOANG
Art Unit
2147
Tech Center
2100 — Computer Architecture & Software
Assignee
Nvidia Corporation
OA Round
5 (Final)
Interview Optional

— +23.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 14 resolved cases, 2023–2026
Examiner Intelligence

TRAN, DAVID HOANG View full profile →
Grants only 14% of cases
Career Allow Rate
2 granted / 14 resolved
-40.7% vs TC avg
Strong +23% interview lift
Without
With
+23.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 2m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
30.4%
-9.6% vs TC avg
§103
45.5%
+5.5% vs TC avg
§102
9.3%
-30.7% vs TC avg
§112
13.3%
-26.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 14 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments filed 01/12/2026 on pages 8-10 of Remarks regarding the rejection under 35 U.S.C. 103 with respect to claims 1-23 have been fully considered but are not persuasive.
Beginning on page 10 of Remarks, Applicant asserts that Gorijala would have to disclose that the discriminator is trained to distinguish between two different sets of values that are generated by the GAN. However, Examiner respectfully disagrees. Zhang teaches the values that are generated by an encoder network and the prior network. See [0069], “The encoder 208q is arranged to receive the observed feature vector X.sub.o as an input and encode it into a latent vector Z (a representation in a latent space)” and [0109], where VAE p.sub.ψ(z) is the prior network trained to model the distribution of z and see Figure 5C of Zhang to see that x is the image and z is the representation of that image in latent space. X to z is the encoding and z to x is the decoding. Gorijala’s discriminator distinguishing values are influenced by z created by the encoder and the prior c in Figure 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 3 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang et al. (US20210358577A1); hereinafter Zhang in view of Gorijala et al. (US20200349447A1); hereinafter Gorijala
Claim 1 is rejected over Zhang and Gorijala.
Regarding claim 1, Zhang teaches a computer-implemented method for creating a generative model, the method comprising: (“the VAE has been trained, to generate a new, unobserved instance of the feature vector {circumflex over (X)} by inputting a random or unobserved value of the latent vector Z into the decoder 208 p.”; [0077])
performing one or more operations based on a plurality of training images to generate a trained encoder network and (“The encoder 208q is arranged to receive the observed feature vector X.sub.o as an input and encode it into a latent vector Z (a representation in a latent space).”; [0069])
a trained prior network, wherein the trained encoder network converts each image included in the plurality of training images into a set of visual attributes, and the trained prior network learns a distribution of the set across the plurality of training images; (“the latent vector Z is a compressed (i.e. encoded) representation of the information contained in the input observations X.sub.o. No one element of the latent vector Z necessarily represents directly any real world quantity, but the vector Z as a whole represents the information in the input data in compressed form. It could be considered conceptually to represent abstract features abstracted from the input data X.sub.o, such as “wrinklyness” and “trunk-like-ness” in the example of elephant recognition (though no one element of the latent vector Z can necessarily be mapped onto any one such factor, and rather the latent vector Z as a whole encodes such abstract information).”; [0070]; Note: latent vector Z is a visual attribute; and and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the prior network trained to model the distribution of z and see Figure 5C of Zhang to see that x is the image and z is the representation of that image in latent space. X to z is the encoding and z to x is the decoding.)
and the trained prior network learns a distribution of the set of visual attributes across the plurality of training images; (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“; [0108] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the prior network trained to model the distribution of z and see Figure 5C of Zhang to see that x is the image and z is the representation of that image in latent space. X to z is the encoding and z to x is the decoding.)
wherein, in operation, the trained prior component produces one or more values for the set of visual attributes in order to generate a new image that is not included in the plurality of training images. (“and the VAE has been trained to encode and decode human faces, then by inputting a random value of Z into the decoder 208 p it is possible to generate a new face that did not belong to any of the sampled subjects during training”, [0077])
combining the trained prior network to produce a trained prior component that is included in the generative model, (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“, [0108] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the prior network trained to model the distribution of z)
Zhang does not teach and the one or more classifiers.
However, Gorijala teaches and the one or more classifiers (See Figure 1 of Gorijala to see that the discriminator is a classifier.)
It would have been obvious before the effective filing date to combine the trained encoder network and trained prior network of Zhang with the discriminator of Gorijala to accurate attribute editing of facial images (Gorijala, page 1). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space. 
Zhang does not teach performing one or more operations to train one or more classifiers to distinguish between values for the set of visual attributes generated by the trained encoder network and values for the set selected from the distribution learned by the trained prior network; and
However, Gorijala teaches performing one or more operations to train one or more classifiers to distinguish between values for the set of visual attributes generated by the trained encoder network and values for the set selected from the distribution learned by the trained prior network; and (“GAN (Goodfellow et al., 2014) consist of two networks, namely, generator (G) and discriminator (D). Generator network takes noise z drawn from a prior distribution p(z) as input and give an image as output. Discriminator network gives the probability that the input image is real. The GAN objective is to train discriminator to distinguish between real and fake data samples, while simultaneously training the generator to fool discriminator.”; page 2, 2.2, Generative Adversarial Network (GAN); Note: See Figure 2)
It would have been obvious before the effective filing date to combine the trained encoder network and trained prior network of Zhang with the discriminator of Gorijala to accurate attribute editing of facial images (Gorijala, page 1). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space.
Claim 3 is rejected over Zhang and Gorijala with the incorporation of claim 1.
Regarding claim 3, Zhang teaches wherein the new image comprises at least one face. (“and the VAE has been trained to encode and decode human faces, then by inputting a random value of Z into the decoder 208 p it is possible to generate a new face that did not belong to any of the sampled subjects during training”, [0077])
Claims 2, 4, 5, 6, 7, 12, 14, 15, 16, 17, 21, 22 and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang and Gorijala as applied above and in further view of Grover et al. (Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting); hereinafter Grover
Claim 2 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 1.
Regarding claim 2, Zhang does not teach wherein combining the trained prior network and the one or more classifiers comprises combining one or more first values selected from the distribution learned by the trained prior network with a reweighting factor that is based on the one or more first values.
However, Grover teaches wherein combining the trained prior network and the one or more classifiers comprises combining one or more first values selected from the distribution learned by the trained prior network with a reweighting factor that is based on the one or more first values. (“used importance weighting to reweigh datapoints based on differences in training and test data distributions i.e., dataset bias. The key difference is that these works are explicitly interested in learning the parameters of a generative model. In contrast, we use the binary classifier for estimating importance weights to correct for the model bias of any fixed generative model.”; page 9, 6 Related Work & Discussion)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the reweighting of Grover to improve the quality of sample outputs (Grover, Abstract). Zhang and Grover are analogous art because they both concern generative models.
Claim 4 is rejected over Zhang, Gorijala and Grover.
Regarding claim 4, Zhang teaches a computer-implemented method for creating a generative model, the method comprising: (“the VAE has been trained, to generate a new, unobserved instance of the feature vector {circumflex over (X)} by inputting a random or unobserved value of the latent vector Z into the decoder 208 p.”; [0077])
performing one or more operations based on a training dataset to generate a trained encoder network (“The encoder 208q is arranged to receive the observed feature vector X.sub.o as an input and encode it into a latent vector Z (a representation in a latent space).”; [0069])
and a trained prior network, wherein the trained encoder network converts a plurality of data points included in the training dataset into a set of latent variables, and (“the latent vector Z is a compressed (i.e. encoded) representation of the information contained in the input observations X.sub.o. No one element of the latent vector Z necessarily represents directly any real world quantity, but the vector Z as a whole represents the information in the input data in compressed form. It could be considered conceptually to represent abstract features abstracted from the input data X.sub.o, such as “wrinklyness” and “trunk-like-ness” in the example of elephant recognition (though no one element of the latent vector Z can necessarily be mapped onto any one such factor, and rather the latent vector Z as a whole encodes such abstract information).”; [0070] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the prior network trained to model the distribution of z and see Figure 5C of Zhang to see that x is the image and z is the representation of that image in latent space. X to z is the encoding and z to x is the decoding.);
the trained prior network learns a distribution of the set of latent variables across the training dataset; (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“, [0108] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the prior network trained to model the distribution of z and see Figure 5C of Zhang to see that x is the image and z is the representation of that image in latent space. X to z is the encoding and z to x is the decoding.)
creating a trained prior component based on the trained prior network and (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“, [0108] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the prior network trained to model the distribution of z)
Zhang does not teach one or more classifiers,
However, Gorijala teaches one or more classifiers, (See Figure 5A of Gorijala to see that the discriminator 504 is used to distinguish between the noise z latent space distribution and the values from the encoder.)
It would have been obvious before the effective filing date to combine the trained prior network of Zhang with the discriminator of Gorijala to create a trained prior component and effectively generate realistic images of faces (Gorijala, [0024]). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space.
Zhang does not teach performing one or more operations to train one or more classifiers to distinguish between values for the set of latent variables generated via the trained encoder network and values sampled from the distribution learned by the trained prior network; and
However, Gorijala teaches performing one or more operations to train one or more classifiers to distinguish between values for the set of latent variables generated via the trained encoder network and values sampled from the distribution learned by the trained prior network; and (See Figure 5A of Gorijala to see that the discriminator 504 is used to distinguish between the noise z latent space distribution and the values from the encoder.)
It would have been obvious before the effective filing date to combine the trained encoder network and trained prior network of Zhang with the discriminator of Gorijala to effectively generate realistic images of faces (Gorijala, [0024]). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space.
Zhang does not teach wherein the trained prior component applies a reweighting factor to one or more first values sampled from the distribution learned by the trained prior network to generate one or more second values for the set of latent variables.
However, Grover teaches wherein the trained prior component applies a reweighting factor to one or more first values sampled from the distribution learned by the trained prior network to generate one or more second values for the set of latent variables, (“used importance weighting to reweigh datapoints based on differences in training and test data distributions i.e., dataset bias. The key difference is that these works are explicitly interested in learning the parameters of a generative model. In contrast, we use the binary classifier for estimating importance weights to correct for the model bias of any fixed generative model.”; page 9, 6 Related Work & Discussion)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the reweighting of Grover to improve the quality of sample outputs (Grover, Abstract). Zhang and Grover are analogous art because they both concern generative models.
Claim 5 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 4.
Regarding claim 5, Zhang teaches wherein the distribution learned by the trained prior network comprises a hierarchy of latent variables, and wherein the one or more first values are sampled from the distribution learned by the trained prior network by: sampling one first value from a first group of latent variables included in the hierarchy of latent variables; and (“VAEM uses a hierarchy of latent variables which is fit in two stages. In the first stage, one type-specific VAE is learned for each dimension. These initial one-dimensional VAEs capture marginal distribution properties and provide a latent representation that is uniform across dimensions.”; [0044]; Note: The first stage is the first group of latent variables)
sampling another first value from a second group of latent variables included in the hierarchy of latent variables based on the first value and a feature map. (“In the second stage, another VAE is used to capture dependencies among the one-dimensional latent representations from the first stage.”; [0044]; Note: The second stage is the second group of latent variables and the use of the feature map at the second stage is referenced in paragraph [0115])
Claim 6 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 4.
Regarding claim 6, Zhang does not teach wherein the one or more classifiers comprise a first classifier that distinguishes between a third value sampled from the first group of latent variables using the trained prior network and a fourth value for the first group of latent variables generated by the trained encoder network, and a second classifier that distinguishes between a fifth value sampled from the second group of latent variables using the trained prior network and a sixth value for the second group of latent variables generated by the trained encoder network. 
However, Gorijala teaches wherein the one or more classifiers comprise a first classifier that distinguishes between a third value sampled from the first group of latent variables using the trained prior network and a fourth value for the first group of latent variables generated by the trained encoder network, and a second classifier that distinguishes between a fifth value sampled from the second group of latent variables using the trained prior network and a sixth value for the second group of latent variables generated by the trained encoder network. (“GAN (Goodfellow et al., 2014) consist of two networks, namely, generator (G) and discriminator (D). Generator network takes noise z drawn from a prior distribution p(z) as input and give an image as output. Discriminator network gives the probability that the input image is real. The GAN objective is to train discriminator to distinguish between real and fake data samples, while simultaneously training the generator to fool discriminator.”; page 2, 2.2, Generative Adversarial Network (GAN); Note: See Figure 2)
It would have been obvious before the effective filing date to combine the trained encoder network and trained prior network of Zhang with the discriminator of Gorijala to accurate attribute editing of facial images (Gorijala, page 1). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space.
Claim 7 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 4.
Regarding claim 7, Zhang does not teach wherein the reweighting factor is applied to the one or more first values by resampling the one or more first values based on importance weights that are proportional to the reweighting factor.
However, Grover teaches wherein the reweighting factor is applied to the one or more first values by resampling the one or more first values based on importance weights that are proportional to the reweighting factor. (“used importance weighting to reweigh datapoints based on differences in training and test data distributions i.e., dataset bias. The key difference is that these works are explicitly interested in learning the parameters of a generative model. In contrast, we use the binary classifier for estimating importance weights to correct for the model bias of any fixed generative model.”; page 9, 6 Related Work & Discussion)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the reweighting of Grover to improve the quality of sample outputs (Grover, Abstract). Zhang and Grover are analogous art because they both concern generative models.
Claim 12 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 4.
Regarding claim 12, Zhang does not teach further comprising calculating the reweighting factor based on output generated by the one or more classifiers from the one or more first values. 

    PNG
    media_image1.png
    43
    226
    media_image1.png
    Greyscale
However, Grover teaches further comprising calculating the reweighting factor based on output generated by the one or more classifiers from the one or more first values. (“To train the classifier, we only require datasets of samples from pθ(x) and p(x) and estimate γ to be the ratio of the size of two datasets. Let cφ : X → [0, 1] denote the probability assigned by the classifier with parameters φ to a sample x belonging to the positive class y = 1. As shown in prior work [9, 22], if cφ is Bayes optimal, then the importance weights can be obtained via this classifier as:                                                                                                                )
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the reweighting of Grover to improve the quality of sample outputs (Grover, Abstract). Zhang and Grover are analogous art because they both concern generative models.
Claim 14 is rejected over Zhang, Gorijala and Grover.
Regarding claim 14, Zhang teaches a non-transitory computer readable medium storing instructions that, when executed by a processor, cause the processor to perform the steps of: (“Each of the controller 202, interface 204 and AI algorithm 206 may be implemented in the form of software code embodied on computer readable storage and run on processing apparatus comprising one or more processors such as CPUs, work accelerator co-processors such as GPUs, and/or other application specific processors”; [0051])
performing one or more operations based on a training dataset to train a generative model, wherein the generative model includes a first component that converts a plurality of data points included in the training dataset into a set of latent variables (“The encoder 208q is arranged to receive the observed feature vector Xo as an input and encode it into a latent vector Z (a representation in a latent space).”, [0069]; Note: The first component is the encoder.)
and a second component that generates a prior distribution of the set of latent variables across the training dataset; (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“, [0108] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the second component trained to model the distribution of z)
wherein, in operation, the trained prior component produces the one or more second values in order to generate a new data point that is not included in the training dataset. (“and the VAE has been trained to encode and decode human faces, then by inputting a random value of Z into the decoder 208 p it is possible to generate a new face that did not belong to any of the sampled subjects during training”, [0077])
creating a trained prior component based on the second component and (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“, [0108] and “To handle complicated statistical dependencies, we utilize the VampPrior, which uses a mixture of Gaussians (MoGs) as the prior distribution for the high-level latent variable i.e.,”; [0109]; Note: VAE p.sub.ψ(z) is the second component trained to model the distribution of z)
Zhang does not teach one or more classifiers, 
However, Gorijala teaches one or more classifiers, (See Figure 5A of Gorijala to see that the discriminator 504 is used to distinguish between the noise z latent space distribution and the values from the encoder.)
It would have been obvious before the effective filing date to combine the trained prior network of Zhang with the discriminator of Gorijala to create a trained prior component and effectively generate realistic images of faces (Gorijala, [0024]). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space.
Zhang does not teach performing one or more operations to train one or more classifiers to distinguish between values for the set of latent variables generated via the first component and values sampled from the prior distribution; and
However, Gorijala teaches performing one or more operations to train one or more classifiers to distinguish between values for the set of latent variables generated via the first component and values sampled from the prior distribution; and (See Figure 5A of Gorijala to see that the discriminator 504 is used to distinguish between the noise z latent space distribution and the values from the encoder.)
It would have been obvious before the effective filing date to combine the trained encoder network and trained prior network of Zhang with the discriminator of Gorijala to effectively generate realistic images of faces (Gorijala, [0024]). Zhang and Gorijala are analogous art because they both concern facial image generation from the latent space.
Zhang does not teach wherein the trained prior component applies a reweighting factor to one or more first values sampled from the prior distribution to generate one or more second values for the set of latent variables, wherein the reweighting factor is determined based on output generated by the one or more classifiers from the one or more first values,
However, Grover teaches wherein the trained prior component applies a reweighting factor to one or more first values sampled from the prior distribution to generate one or more second values for the set of latent variables, wherein the reweighting factor is determined based on output generated by the one or more classifiers from the one or more first values, (“used importance weighting to reweigh datapoints based on differences in training and test data distributions i.e., dataset bias. The key difference is that these works are explicitly interested in learning the parameters of a generative model. In contrast, we use the binary classifier for estimating importance weights to correct for the model bias of any fixed generative model.”; page 9, 6 Related Work & Discussion)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the reweighting of Grover to improve the quality of sample outputs (Grover, Abstract). Zhang and Grover are analogous art because they both concern generative models.
Claim 15 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 14.
Regarding claim 15, Zhang teaches wherein the instructions further cause the processor to perform the steps of performing one or more decoding operations on the one or more second values via a decoder network included in the generative model to generate the new data point. (“a random or unobserved value of the latent vector H can be input to the second decoder 208 pH in order to generate a new instance of the feature vector {circumflex over (X)} that was not observed in the training data. E.g. this could be used to generate a fictional face for use in a move or game, or to generate details of a functional patient for training or study purposes, etc.”; [0094]; Note: The generation of a new instance not observed in the training data is the new data point.)
Claim 16 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 14.
Regarding claim 16, Zhang teaches wherein the decoder network is implemented by at least one of a generator network included in a generative adversarial network, a decoder portion of a variational autoencoder, or an invertible decoder represented by one or more normalizing flows. (“The decoder is sometimes referred to as a generative network in that it generates a version {circumflex over (X)} of the input feature space from the latent vector Z.”; [0010])
Dependent claim 17 is claim 7 in the form of a non-transitory computer readable medium and is rejected for the same reasons as claim 7 stated above. For the rejections of the limitations specifically pertaining to the non-transitory computer readable medium of claim 14, please see the rejection of claim 14 above.
Claim 21 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 14.
Regarding claim 21, Zhang does not teach wherein the instructions further cause the processor to perform the step of generating the reweighting factor by computing a quotient of a probability that is output by the one or more classifiers and a difference between the probability and one.

    PNG
    media_image1.png
    43
    226
    media_image1.png
    Greyscale
However, Grover teaches wherein the instructions further cause the processor to perform the step of generating the reweighting factor by computing a quotient of a probability that is output by the one or more classifiers and a difference between the probability and one. (“To train the classifier, we only require datasets of samples from pθ(x) and p(x) and estimate γ to be the ratio of the size of two datasets. Let cφ : X → [0, 1] denote the probability assigned by the classifier with parameters φ to a sample x belonging to the positive class y = 1. As shown in prior work [9, 22], if cφ is Bayes optimal, then the importance weights can be obtained via this classifier as:     	)

It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the reweighting of Grover to improve the quality of sample outputs (Grover, Abstract). Zhang and Grover are analogous art because they both concern generative models.
Claim 22 is rejected over Zhang, Gorijala and Grover with the incorporation of claim 14.
Regarding claim 22, Zhang teaches wherein the second component is implemented by at least one of a prior network or a Gaussian distribution. (“where h is the latent space of the dependency network. The above procedure effectively disentangles the inter-variable, heterogeneous properties of mixed type data (modelled by marginal VAEs), from inter-variable dependencies (modelled by prior networks). We call our model VAE for heterogeneous mixed type data (VAEM)“, [0108]; Note: The second component is the prior network)
Claim 23 is rejected over Zhang and Grover with the incorporation of claim 14.
Regarding claim 23, Zhang teaches wherein the first component is implemented by at least one of an encoder portion of a variational autoencoder, a numerical inversion applied to a generator network included in a generative adversarial network, or an inverse of a decoder included in a normalizing flow network. (“the method comprises training each of a plurality of individual first variational auto encoders, VAEs, each comprising an individual respective first encoder arranged to encode a respective subset of one or more features of a feature space into an individual respective first latent representation having one or more dimensions,”; [0015]; Note: The first component is the encoder portion of a variational autoencoder.)
Claims 8, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Gorijala and Grover as applied above and in further view of Che et al. (Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling); hereinafter Che
Claim 8 is rejected over Zhang, Gorijala and Grover and Che with the incorporation of claim 4.
Regarding claim 8, Zhang does not teach wherein the reweighting factor is applied to the one or more first values by iteratively updating the one or more first values based on a gradient of an energy function associated with the distribution learned by the trained prior network and the reweighting factor. 

    PNG
    media_image2.png
    45
    372
    media_image2.png
    Greyscale
However, Che teaches wherein the reweighting factor is applied to the one or more first values by iteratively updating the one or more first values based on a gradient of an energy function associated with the distribution learned by the trained prior network and the reweighting factor. (“One common MCMC algorithm in continuous state spaces is Langevin dynamics, with an update equation
Langevin dynamics are guaranteed to exactly sample from the target distribution p(x) as
.”; 2.2. Energy-Based Models and Langevin Dynamics
    PNG
    media_image3.png
    23
    114
    media_image3.png
    Greyscale
)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the energy function of Che to improve the generation of data (Che, Abstract). Zhang and Che are analogous arts because they both concern data generation.
Dependent claim 18 is claim 8 in the form of a non-transitory computer readable medium and is rejected for the same reasons as claim 8 stated above. For the rejections of the limitations specifically pertaining to the non-transitory computer readable medium of claim 14, please see the rejection of claim 14 above.
Claim 19 is rejected over Zhang, Gorijala, Grover and Che with the incorporation of claim 14.
Regarding claim 19, Zhang does not teach wherein the energy function comprises a difference between the prior distribution and the reweighting factor.
However, Che teaches wherein the energy function comprises a difference between the prior distribution and the reweighting factor. (“Interestingly, pt(z) has the form of an energy-based model, pt(z) = e−E(z) / Z’, with tractable energy function E(z) = − log p0(z) − d(G(z)).” Note: p0(z) represents the prior distribution and d(G(z) represents the reweighting factor as determined by the discriminator output score)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the energy function of Che to improve the generation of data (Che, Abstract). Zhang and Che are analogous arts because they both concern data generation.
Claims 9, 10, 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Gorijala and Grover as applied above and in further view of Liu et al. (SENet for Weakly-Supervised Relation Extraction); hereinafter Liu
Claim 9 is rejected over Zhang, Gorijala, Grover and Liu with the incorporation of claim 4.
Regarding claim 9, Zhang does not teach wherein at least one of the one or more classifiers comprises a residual neural network.
However, Liu teaches wherein at least one of the one or more classifiers comprises a residual neural network. (See Figure 2 of Liu to see that the SE-ResNet-D is a squeeze-and excitation residual network.)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with residual neural network of Che to improve the performance of data processing (Liu, page 511, column 2). Zhang and Liu are analogous art because they both concern encoding data.
Claim 10 is rejected over Zhang, Gorijala, Grover and Liu with the incorporation of claim 4.
Regarding claim 10, Zhang does not teach wherein the residual neural network includes a first batch normalization layer having a first Swish activation function, a first convolutional layer, a second batch normalization layer having a second Swish activation function, a second convolutional layer, and a squeeze and excitation layer.
However, Liu teaches wherein the residual neural network includes a first batch normalization layer having a first Swish activation function, a first convolutional layer, a second batch normalization layer having a second Swish activation function, a second convolutional layer, and a squeeze and excitation layer. (“We use double pooling and Swish activation function in our model, achieving a better result“; page 512; Note: See Figure 2 of Liu to see that the SE-ResNet-D is a squeeze-and excitation residual network that consists of convolutional layers, batch normalization layers, swish activation function and squeeze and excitation.)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with residual neural network of Che to improve the performance of data processing (Liu, page 511, column 2). Zhang and Liu are analogous art because they both concern encoding data.
Claim 11 is rejected over Zhang, Gorijala, Grover and Liu with the incorporation of claim 4.
Regarding claim 11, Zhang does not teach wherein the residual neural network includes a Swish activation function and a sequence of convolutional kernels.
However, Liu teaches wherein the residual neural network includes a Swish activation function and a sequence of convolutional kernels. (“We use double pooling and Swish activation function in our model, achieving a better result“; page 512; Note: See Figure 2 of Liu to see that the SE-ResNet-D is residual network that consists of a swish activation function and a sequence of convolutional kernels.)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with residual neural network of Che to improve the performance of data processing (Liu, page 511, column 2). Zhang and Liu are analogous art because they both concern encoding data.
Claim 20 is rejected over Zhang, Gorijala, Grover and Liu with the incorporation of claim 14.
Regarding claim 20, Zhang does not teach wherein at least one of the one or more classifiers comprises a sequence of residual blocks, and at least one residual block in the sequence of residual blocks comprises a first batch normalization layer with a first Swish activation function, a first convolutional layer following the first batch normalization layer with the first Swish activation function, a second batch normalization layer with a second Swish activation function, a second convolutional layer following the second batch normalization layer with the second Swish activation function, and a squeeze and excitation layer.
However, Zhang teaches wherein at least one of the one or more classifiers comprises a sequence of residual blocks, and at least one residual block in the sequence of residual blocks comprises a first batch normalization layer with a first Swish activation function, a first convolutional layer following the first batch normalization layer with the first Swish activation function, a second batch normalization layer with a second Swish activation function, a second convolutional layer following the second batch normalization layer with the second Swish activation function, and a squeeze and excitation layer. (“We use double pooling and Swish activation function in our model, achieving a better result“; page 512; Note: See Figure 2 of Liu to see that the SE-ResNet-D is a squeeze-and excitation residual network that consists of convolutional layers, batch normalization layers, swish activation function and squeeze and excitation.)
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with residual neural network of Che to improve the performance of data processing (Liu, page 511, column 2). Zhang and Liu are analogous art because they both concern encoding data.
Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang, Gorijala and Grover as applied above and in further view of Wang et al. (US20210019541A1); hereinafter Wang
Claim 13 is rejected over Zhang, Gorijala, Grover and Wang with the incorporation of claim 4.
Regarding claim 13, Zhang does not teach wherein performing the one or more operations to train the one or more classifiers comprises iteratively updating weights of the one or more classifiers based on a binary cross-entropy loss.
However, Wang teaches wherein performing the one or more operations to train the one or more classifiers comprises iteratively updating weights of the one or more classifiers based on a binary cross-entropy loss. (“The discriminator 124B can apply the loss function 414 to the feature map 412A from feature extractor 410A, the feature map 412B from feature extractor 410B, and the feature map 412N from feature extractor 410N. In some examples, the loss function 414 can be a least squares loss function. The loss function 414 can then output a result 416. In some examples, the result 416 can be a binary or probabilities output such as [true, false] or [0, 1]. Such output (e.g., result 416) can, in some cases, provide a classification or discrimination decision.”; [0093] and “The forward pass, loss function, backward pass, and parameter update is performed for one training iteration. The process can be repeated for a certain number of iterations for each set of training data (e.g., image data) until the weights of the layers 502, 504, 506 in the neural network 500 are accurately tuned.”; [0101])
It would have been obvious before the effective filing date to combine the trained encoder network and the trained prior network of Zhang with the binary cross-entropy loss updating of Wang to improve facial image classification (Wang, [0118]). Zhang and Wang are analogous art because they both concern image generation using variational autoencoders.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
(Pub. No.: US 20210397797 A1) – “Zekang Li” relates to Prior network latent distribution
NPL: Wang, Jiayu et al. “Unregularized Auto-Encoder with Generative Adversarial Networks for Image Generation.” (2018).
NPL: Gorijala, Stanislav et al. “Adversarial Latent Autoencoders.” (2020).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID H TRAN whose telephone number is (703)756-1525. The examiner can normally be reached M-F 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/DAVID H TRAN/Examiner, Art Unit 2147                                                                                                                                                                                                        
/VIKER A LAMARDO/Supervisory Patent Examiner, Art Unit 2147
Read full office action
Prosecution Timeline

Mar 24, 2021
Application Filed
Apr 01, 2024
Non-Final Rejection — §103
Jun 18, 2024
Response Filed
Oct 03, 2024
Non-Final Rejection — §103
Jan 10, 2025
Response Filed
Feb 05, 2025
Final Rejection — §103
Apr 09, 2025
Response after Non-Final Action
May 30, 2025
Notice of Allowance
May 30, 2025
Response after Non-Final Action
Jun 25, 2025
Response after Non-Final Action
Oct 10, 2025
Non-Final Rejection — §103
Jan 12, 2026
Response Filed
Jan 29, 2026
Examiner Interview Summary
Jan 29, 2026
Applicant Interview (Telephonic)
Feb 04, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/571,542
Patent 12579404
PROCESSOR FOR NEURAL NETWORK, PROCESSING METHOD FOR NEURAL NETWORK, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

6-7
Expected OA Rounds
14%
Grant Probability
38%
With Interview (+23.2%)
4y 2m
Median Time to Grant
High
PTA Risk
Based on 14 resolved cases by this examiner. Grant probability derived from career allow rate.
TRAINING A LATENT-VARIABLE GENERATIVE MODEL WITH A NOISE CONTRASTIVE PRIOR

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email