Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
Acknowledgment is made of the Information Disclosure Statement dated 01/09/2026. All of the cited references have been considered.
Response to Arguments
Applicant’s arguments filed 02/10/2026 on pages 9-12 of Remarks regarding the rejection under 35 U.S.C. 102 and 103 with respect to claims 1-25 have been fully considered but are moot in view of the new references Grattarola, Burda and Tomczak.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3, 4, 9, 14, 15, 22, 23, 24 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia et al. (dpVAEs: Fixing Sample Generation for Regularized VAEs); hereinafter Bhalodia in view of Grattarola et al. (Adversarial autoencoders with constant-curvature latent manifolds); hereinafter Grattarola and in further view of Burda et al. (Importance Weighted Autoencoders); hereinafter Burda
Claim 1 is rejected over Bhalodia, Grattarola and Burda.
Regarding claim 1, Bhalodia teaches a computer-implemented method for generating images using a variational autoencoder, the method comprising: (Bhalodia [page 8]: “Factor-dpVAE latent traversals across the top five latent dimensions sorted by their standard deviation from qϕ(z): Traversals start with the reconstructed image of a given sample and move ±5 standard deviations along a latent dimension.”;)
determining, based on a distribution of visual attributes learned by a trained prior network, one or more first values for a set of visual attributes (Bhalodia [3.2. Invertible Deep Networks]: “The proposed decoupled prior family for VAEs leverages flow-based generative models that are formed by a sequence of invertible blocks (i.e. transformations), parameterized by deep networks … By maximizing the log-likelihood and parameterizing the invertible blocks with deep networks, flow-based methods learn to transform a simple, tractable base distribution (e.g. standard normal) into a more expressive one.”; Note: See Figure 2 of Bhalodia to see that the inverse mapping g-1 adjusts z--0 (first value) sampled from the prior (generation space that hosts the prior distribution) to generate z = gn-1 (z--0) (second value). Then, the second value z = g-1 (z--0) shifts towards z = qϕ(z|x) (third value) generated by the encoder.))
performing one or more decoding operations on the one or more second values via a trained decoder network to generate an image. (Bhalodia [page 8]: “Sample generation (image generation) starts by sampling z0 ~ p(z0) , passing through the inverse mapping to obtain z = gn-1 (z--0) (second value), which is then decoded by the generative model pθ(x|z) (trained decoder) (see Figure 2a).”; page 5, column 1; and “We see that the traversal path in qϕ(z) tries to avoid low probability regions, which correspond to better image quality.”;)
Bhalodia does not appear to explicitly teach generating a [reweighting] factor based on a trained classifier that distinguishes between values sampled from the distribution of visual attributes and values from a distribution of latent variables generated by a trained encoder network;
However, Grattarola teaches generating a [reweighting] factor based on a trained classifier that distinguishes between values sampled from the distribution of visual attributes and values from a distribution of latent variables generated by a trained encoder network; (Grattarola [2.1 Adversarial autoencoders]: “the discriminator is trained to distinguish between samples coming from the encoder and those coming from the true prior.”; and [Fig. 1.]: “The discriminator is trained to distinguish between embeddings and samples coming from the spherical uniform prior (blue path). Finally, the membership degree of the embeddings (yellow path) is averaged with the discriminator in order to compute the loss and update the encoder.”)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the discriminator of Grattarola to improve autoencoder tasks (Grattarola, Abstract). Bhalodia and Grattarola are analogous art because they born concern variational autoencoders.
Bhalodia does not appear to explicitly teach applying a reweighting factor to the one or more first values in order to generate one or more second values for the set of [visual] attributes,
However, Burda teaches applying a reweighting factor to the one or more first values in order to generate one or more second values for the set of [visual] attributes, (Burda [3 Importance Weighted Autoencoder]: “The term inside the sum corresponds to the unnormalized importance weights for the joint distribution, which we will denote as …”; and [1 Introduction]: “The recognition network generates multiple approximate posterior samples, and their weights are averaged.”; Note: The first values are hi and reweighting factors are wi .where the averaged weights are the second values.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the reweighting of Burda to improve the flexibility of generative models currently trained with the VAE objective (Burda, 6 Conclusion). Bhalodia and Burda are analogous art because they both concern generative models and variational autoencoders.
Claim 3 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 1.
Regarding claim 3, Bhalodia teaches wherein the image comprises at least one face. (Bhalodia [page 8]: “Factor-dpVAE latent traversals across the top five latent dimensions sorted by their standard deviation from qϕ(z): Traversals start with the reconstructed image of a given sample and move ±5 standard deviations along a latent dimension.”; Note: See Figure 8 of Bhalodia to see that the reconstructed images comprise of faces.)
Claim 24 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 1.
Regarding claim 24, Bhalodia does not appear to explicitly teach wherein applying the reweighting factor to the one or more first values comprises combining a value of the reweighting factor with the one or more first values to generate the one or more second values.
However, Burda teaches wherein applying the reweighting factor to the one or more first values comprises combining a value of the reweighting factor with the one or more first values to generate the one or more second values. (Burda [3 Importance Weighted Autoencoder]: “The term inside the sum corresponds to the unnormalized importance weights for the joint distribution, which we will denote as …”; and [1 Introduction]: “The recognition network generates multiple approximate posterior samples, and their weights are averaged.”; Note: The first values are hi and reweighting factors are wi .where the averaged weights are the second values.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the reweighting of Burda to improve the flexibility of generative models currently trained with the VAE objective (Burda, 6 Conclusion). Bhalodia and Burda are analogous art because they both concern generative models and variational autoencoders.
Claim 25 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 1.
Regarding claim 25, Bhalodia teaches wherein the one or more second values represent the one or more first values shifted towards one or more third values for the set of visual attributes, and wherein the one or more third values have been generated via the trained encoder network. (Bhalodia [3.2. Invertible Deep Networks]: “The proposed decoupled prior family for VAEs leverages flow-based generative models that are formed by a sequence of invertible blocks (i.e. transformations), parameterized by deep networks … By maximizing the log-likelihood and parameterizing the invertible blocks with deep networks, flow-based methods learn to transform a simple, tractable base distribution (e.g. standard normal) into a more expressive one.”; Note: See Figure 2 of Bhalodia to see that the inverse mapping g-1 (reweighting factor) adjusts z--0 (first value) sampled from the prior (generation space that hosts the prior distribution) to generate z = gn-1 (z--0) (second value). Then, the second value z = g-1 (z--0) shifts towards z = qϕ(z|x) (third value) generated by the encoder.))
Claim 4 is rejected over Bhalodia, Grattarola and Burda.
Regarding claim 4, Bhalodia teaches a computer-implemented method for generating data using a generative model, the method comprising: (Bhalodia [page 8]: “Factor-dpVAE latent traversals across the top five latent dimensions sorted by their standard deviation from qϕ(z): Traversals start with the reconstructed image of a given sample and move ±5 standard deviations along a latent dimension.”;)
sampling one or more first values from a distribution of latent variables learned by a trained prior network included in the generative model; (Bhalodia [3.2. Invertible Deep Networks]: “The proposed decoupled prior family for VAEs leverages flow-based generative models that are formed by a sequence of invertible blocks (i.e. transformations), parameterized by deep networks … By maximizing the log-likelihood and parameterizing the invertible blocks with deep networks, flow-based methods learn to transform a simple, tractable base distribution (e.g. standard normal) into a more expressive one.”; Note: See Figure 2 of Bhalodia to see that the inverse mapping g-1 (reweighting factor) adjusts z--0 (first value) sampled from the prior (generation space that hosts the prior distribution) to generate z = gn-1 (z--0) (second value). Then, the second value z = g-1 (z--0) shifts towards z = qϕ(z|x) (third value) generated by the encoder.))
Bhalodia does not appear to explicitly teach applying a reweighting factor to the one or more first values in order to generate one or more second values for the latent variables,
However, Burda teaches applying a reweighting factor to the one or more first values in order to generate one or more second values for the latent variables, (Burda [3 Importance Weighted Autoencoder]: “The term inside the sum corresponds to the unnormalized importance weights for the joint distribution, which we will denote as …”; and [1 Introduction]: “The recognition network generates multiple approximate posterior samples, and their weights are averaged.”; Note: The first values are hi and reweighting factors are wi .where the averaged weights are the second values.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the reweighting of Burda to improve the flexibility of generative models currently trained with the VAE objective (Burda, 6 Conclusion). Bhalodia and Burda are analogous art because they both concern generative models and variational autoencoders.
Bhalodia does not appear to explicitly teach [wherein the reweighting factor is generated] based on one or more trained classifiers that operate to distinguish between values sampled from the distribution and values for the latent variables generated via a trained encoder network included in the generative model; and
However, Grattarola teaches [wherein the reweighting factor is generated] based on one or more trained classifiers that operate to distinguish between values sampled from the distribution and values for the latent variables generated via a trained encoder network included in the generative model; and (Grattarola [2.1 Adversarial autoencoders]: “the discriminator is trained to distinguish between samples coming from the encoder and those coming from the true prior.”; and [Fig. 1.]: “The discriminator is trained to distinguish between embeddings and samples coming from the spherical uniform prior (blue path). Finally, the membership degree of the embeddings (yellow path) is averaged with the discriminator in order to compute the loss and update the encoder.”)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the discriminator of Grattarola to improve autoencoder tasks (Grattarola, Abstract). Bhalodia and Grattarola are analogous art because they born concern variational autoencoders.
Claim 9 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 4.
Regarding claim 4, Bhalodia does not appear to explicitly teach wherein applying the reweighting factor to the one or more first values comprises resampling the one or more first values based on importance weights that are proportional to the reweighting factor.
However, Burda teaches wherein applying the reweighting factor to the one or more first values comprises resampling the one or more first values based on importance weights that are proportional to the reweighting factor. (Burda [page 5]: “The sum in Eqn. 14 can be stochastically approximated by choosing a single sample ℇi proprtional to its normalized weight
w
i
~
”;)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the reweighting of Burda to improve the flexibility of generative models currently trained with the VAE objective (Burda, 6 Conclusion). Bhalodia and Burda are analogous art because they both concern generative models and variational autoencoders.
Claim 14 is claim 4 in the form of a non-transitory computer readable medium and is rejected for the same reasons as claim 1 stated above.
Claim 15 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 14.
Regarding claim 15, Bhalodia does not appear to explicitly teach training the generative model based on a training dataset during a first training stage; and
after the first training stage is complete, training the one or more classifiers to distinguish between the values sampled from the distribution and the values for the latent variables generated via an encoder network during a second training stage.
However, Grattarola teaches training the generative model based on a training dataset during a first training stage; and (Grattarola [Abstract]: “we introduce the CCM adversarial autoencoder (CCM-AAE), a probabilistic generative model trained to represent a data distribution on a CCM. Our method works by matching the aggregated posterior of the CCM-AAE with a probability distribution defined on a CCM, so that the encoder implicitly learns to represent data on the CCM to fool the discriminator network.”)
after the first training stage is complete, training the one or more classifiers to distinguish between the values sampled from the distribution and the values for the latent variables generated via an encoder network during a second training stage. (Grattarola [2.1 Adversarial autoencoders]: “the discriminator is trained to distinguish between samples coming from the encoder and those coming from the true prior.”;)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the discriminator of Grattarola to improve autoencoder tasks (Grattarola, Abstract). Bhalodia and Grattarola are analogous art because they born concern variational autoencoders.
Claim 22 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 14.
Regarding claim 22, Bhalodia teaches wherein the trained decoder network is implemented by at least one of a trained generator network included in a generative adversarial network, a trained decoder portion of a variational autoencoder, [or
an invertible decoder represented by one or more trained normalizing flows.] (Bhalodia [page 8]: “Sample generation (image generation) starts by sampling z0 ~ p(z0) , passing through the inverse mapping to obtain z = gn-1 (z--0) (second value), which is then decoded by the generative model pθ(x|z) (trained decoder) (see Figure 2a).”; page 5, column 1; and “We see that the traversal path in qϕ(z) tries to avoid low probability regions, which correspond to better image quality.”;)
Claim 23 is rejected over Bhalodia, Grattarola and Burda with the incorporation of claim 14.
Regarding claim 23, Bhalodia teaches wherein the trained encoder network is implemented by at least one of a trained encoder portion of a variational autoencoder, [a numerical inversion applied to a trained generator network included in a generative adversarial network, or an inverse of a trained decoder included in a normalizing flow network.] (Bhalodia [page 4]: “Figure 2. dpVAE: (a) The latent space is decoupled into a generation space with a simple, tractable distribution (e.g. standard normal) and a representation space whose distribution can be arbitrarily complex and is learned via a bijective mapping to the generation space. (b) Architecture of a VAE with the decoupled prior. The g-bijection is jointly trained with the VAE generative (i.e. decoder) and inference (i.e. encoder) models.”;)
Claims 5, 10, 11, 12 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia, Grattarola and Burda in view of Grover et al. (Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting); hereinafter Grover
Claim 5 is rejected over Bhalodia, Grattarola, Burda and Grover with the incorporation of claim 4.
Regarding claim 5, Bhalodia does not appear to explicitly teach further comprising training the one or more classifiers based on a binary cross-entropy loss to generate the one or more trained classifiers.
However, Grover teaches further comprising training the one or more classifiers based on a binary cross-entropy loss to generate the one or more trained classifiers. (Grover [page 16; C.2 Synthetic experiment]: “We believe the default calibration behavior is largely due to the fact that our binary classifiers distinguishing real and fake data do not require very complex neural networks architectures and training tricks that lead to miscalibration for multi-class classification. As shown in [61], shallow networks are well-calibrated and [62] further argue that a major reason for miscalibration is the use of a softmax loss typical for multi-class problems.”; page 16, C.1 Calibration; and “The classifier used in this case is a multi-layer perceptron with a single hidden layer of 100 units and has been trained to minimize the cross-entropy loss by first order optimization methods.”;)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the binary classifier of Grover to improve image generation (Bhalodia, page 2, column 2 and Grover, page 4, paragraph 3.) Bhalodia and Grover are analogous art because they concern image generation via variational autoencoders.
Claim 10 is rejected over Bhalodia, Grattarola, Burda and Grover with the incorporation of claim 4.
Regarding claim 10, Bhalodia does not appear to explicitly teach wherein applying the reweighting factor to the one or more first values comprises iteratively updating the one or more first values based on a gradient of an energy function associated with the distribution and the reweighting factor.
However, Grover teaches wherein applying the reweighting factor to the one or more first values comprises iteratively updating the one or more first values based on a gradient of an energy function associated with the distribution and the reweighting factor. (Grover [page 1, column 2]: “The ELBO is then maximized using stochastic gradient descent by virtue of the reparameterization trick [24, 46].”; Note: Stochastic gradient descent is an iterative optimization algorithm used to train the VAE.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the reweighting factor generated by the classifier of Grover to improve image generation (Bhalodia, page 2, column 2 and Grover, page 4, paragraph 3.) Bhalodia and Grover are analogous art because they concern image generation via variational autoencoders.
Claim 11 is rejected over Bhalodia, Grattarola, Burda and Grover with the incorporation of claim 4.
Regarding claim 11, Bhalodia does not appear to explicitly teach wherein the energy function comprises a difference between the distribution and the reweighting factor.
PNG
media_image1.png
67
483
media_image1.png
Greyscale
However, Grover teaches wherein the energy function comprises a difference between the distribution and the reweighting factor. (See page 5, 4 Importance Resampled Generative Modeling of Grover to see that pθ(x) is the distribution and wϕ(x) is the reweighting factor in the following expression:
);
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the reweighting factor generated by the classifier of Grover to improve image generation (Bhalodia, page 2, column 2 and Grover, page 4, paragraph 3.) Bhalodia and Grover are analogous art because they concern image generation via variational autoencoders.
Claim 12 is rejected over Bhalodia, Grattarola, Burda and Grover with the incorporation of claim 4.
Regarding claim 12, Bhalodia does not appear to explicitly teach wherein the reweighting factor is generated by computing a quotient of a probability that is output by the one or more trained classifiers and a difference between the probability and one.
PNG
media_image2.png
46
209
media_image2.png
Greyscale
However, Grover teaches wherein the reweighting factor is generated by computing a quotient of a probability that is output by the one or more trained classifiers and a difference between the probability and one. (Grover [page 3, 3 Likelihood-Free Importance Weighting]: “To train the classifier, we only require datasets of samples from pθ(x) and p(x) and estimate γ to be the ratio of the size of two datasets. Let cϕ: X → [0; 1] denote the probability assigned by the classifier with parameters ϕ to a sample xbelonging to the positive class y = 1. As shown in prior work [9, 22], if cϕ is Bayes optimal, then the importance weights can be obtained via this classifier as:
);
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the binary classifier of Grover to improve image generation (Bhalodia, page 2, column 2 and Grover, page 4, paragraph 3. Bhalodia and Grover are analogous art because they concern image generation via variational autoencoders.
Dependent claim 16 is claim 5 in the form of a non-transitory computer readable medium and is rejected for the same reasons as claim 5 stated above. For the rejection of the limitations specifically pertaining to the non-transitory computer readable medium of claim 14, see the rejection of claim 14 above.
Claims 6 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia, Grattarola and Burda in view of Tomczak et al. (VAE with a VampPrior); hereinafter Tomczak
Claim 6 is rejected over Bhalodia, Grattarola, Burda and Tomczak with the incorporation of claim 4.
Regarding claim 6, Bhalodia teaches the trained encoder network, and the trained decoder network are trained using a training dataset (Bhalodia [page 4]: “Figure 2. dpVAE: (a) The latent space is decoupled into a generation space with a simple, tractable distribution (e.g. standard normal) and a representation space whose distribution can be arbitrarily complex and is learned via a bijective mapping to the generation space. (b) Architecture of a VAE with the decoupled prior. The g-bijection is jointly trained with the VAE generative (i.e. decoder) and inference (i.e. encoder) models.”;)
Bhalodia does not appear to explicitly teach wherein the trained prior network,
However, Tomczak teaches wherein the trained prior network, (Tomczak [Abstract]: “The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs.”)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the VampPrior of Tomczak to improve the performance of the VAE (Tomczak, page 6). Bhalodia and Tomczak are analogous art because they both concern generative models and variational autoencoders.
Bhalodia does not appear to teach prior to training one or more classifiers to generate the one or more trained classifiers.
However, Grattarola teaches prior to training one or more classifiers to generate the one or more trained classifiers. (Grattarola [2.1 Adversarial autoencoders]: “the discriminator is trained to distinguish between samples coming from the encoder and those coming from the true prior.”;)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the discriminator of Grattarola to improve autoencoder tasks (Grattarola, Abstract). Bhalodia and Grattarola are analogous art because they born concern variational autoencoders.
Claim 21 is rejected over Bhalodia, Grattarola, Burda and Tomczak with the incorporation of claim 14.
Regarding claim 21, Bhalodia does not appear to explicitly teach wherein the trained prior component is implemented by at least one of a prior network or a Gaussian distribution.
However, Tomczak teaches wherein the trained prior component is implemented by at least one of a prior network or a Gaussian distribution. (Tomczak [Abstract]: “The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs.”)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the VampPrior of Tomczak to improve the performance of the VAE (Tomczak, page 6). Bhalodia and Tomczak are analogous art because they both concern generative models and variational autoencoders.
Claims 13 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia, Grattarola and Burda in view of Wang et al. (US 20210019541 A1); hereinafter Wang
Claim 13 is rejected over Bhalodia, Grattarola, Burda and Wang with the incorporation of claim 4.
Regarding claim 13, Bhalodia does not appear to explicitly teach wherein at least one of the one or more classifiers comprises a residual neural network.
However, Wang teaches wherein at least one of the one or more classifiers comprises a residual neural network. (Wang [0113]: “FIG. 6 illustrates an example configuration of a residual network (ResNet) model 600 that can be implemented by an encoder (e.g., 120A, 120B) when mapping input image data to a vector space or code of a certain dimensionality. The ResNet model 600 can perform residual learning, where instead of learning features at the end of a network's layers, the network learns a residual. The residual can be understood as a subtraction of features learned from the input of a layer.”;)
It would have been obvious before the effective filing date to combine the latent correction using inverse mapping of Bhalodia with the image generation using VAE-GANs of Wang to improve image generation (Wang, [0035]). Bhalodia and Wang are analogous art because they both concern image generation via variational autoencoders.
Claim 19 is rejected over Bhalodia, Grattarola, Burda and Wang with the incorporation of claim 14.
Regarding claim 19, Bhalodia does not appear to explicitly teach wherein the one or more trained classifiers comprise a convolutional layer and one or more residual blocks.
However, Wang teaches wherein the one or more trained classifiers comprise a convolutional layer and one or more residual blocks. (Wang [0115]: “The ResNet model 600 then applies a ReLU activation function 606 to the data generated by the convolutional layer 604, such as a feature map generated by the convolutional layer 604. The ResNet model 600 then performs instance normalization 608 on the output of the ReLU activation function 606, and the result is passed through another convolutional layer 610, which can perform convolutions such as 3×3 convolutions.”;)
It would have been obvious before the effective filing date to combine the latent correction using inverse mapping of Bhalodia with the image generation using VAE-GANs of Wang to improve image generation (Wang, [0035]). Bhalodia and Wang are analogous art because they both concern image generation via variational autoencoders.
Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia, Grattarola and Burda in view of Rolfe et al. (US 20180247200 A1); hereinafter Rolfe
Claim 7 is rejected over Bhalodia, Grattarola, Burda and Rolfe with the incorporation of claim 4.
Regarding claim 7, Bhalodia does not appear to explicitly teach wherein the distribution of latent variables learned by the trained prior network comprises a hierarchy of latent variables, and sampling the one or more first values comprises:
sampling a first value from a first group of latent variables included in the hierarchy of latent variables; and
sampling a second value from a second group of latent variables included in the hierarchy of latent variables based on the first value and a feature map.
However, Rolfe teaches wherein the distribution of latent variables learned by the trained prior network comprises a hierarchy of latent variables, and sampling the one or more first values comprises: (Rolfe [0180]: “FIG. 6 is a schematic diagram illustrating an example implementation of a variational auto-encoder (VAE) with a hierarchy of continuous latent variables with an approximating posterior 610 and a prior 620.”;)
sampling a first value from a first group of latent variables included in the hierarchy of latent variables; and (Rolfe [0030]: “the first prior distribution conditioned on samples from at least one of the first and the second group of supplementary continuous random variables”;)
sampling a second value from a second group of latent variables included in the hierarchy of latent variables based on the first value and a feature map. (Rolfe [0181]: “the first prior distribution conditioned on samples from at least one of the first and the second group of supplementary continuous random variables”; [0030]; and “in approximating posterior 610 and prior 620, respectively, denotes a layer of continuous latent variables and is conditioned on the layers preceding it. In the example implementation of FIG. 6, there are three levels of hierarchy.”; and [0156]: “There is, therefore, technical benefit in incorporating convolutional architectures into variational auto-encoders,”; Note: The convolutional architectures used will include the use of feature maps.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with a hierarchy of latent variables sampled of Rolfe to improve sample reconstruction quality (Rolfe, [0081]). Bhalodia and Rolfe are analogous art because they both concern sampling latent variables in variational autoencoders.
Dependent claim 17 is claim 7 in the form of a non-transitory computer readable medium and is rejected for the same reasons as claim 7 stated above. For the rejection of the limitations specifically pertaining to the non-transitory computer readable medium of claim 14, see the rejection of claim 14 above.
Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia, Grattarola, Burda and Rolfe in view of Wang (US 20210019541 A1); hereinafter Wang
Claim 8 is rejected over Bhalodia, Grattarola, Burda and Rolfe with the incorporation of claim 4.
Regarding claim 8, Bhalodia does not appear to explicitly teach wherein the one or more trained classifiers comprise a first trained classifier that distinguishes between a third value sampled from the first group of latent variables [using the prior network] and a fourth value for the first group of latent variables generated by the trained encoder network and
However, Grattarola teaches wherein the one or more trained classifiers comprise a first trained classifier that distinguishes between a third value sampled from the first group of latent variables [using the prior network] and a fourth value for the first group of latent variables generated by the trained encoder network and (Grattarola [2.1 Adversarial autoencoders]: “the discriminator is trained to distinguish between samples coming from the encoder and those coming from the true prior.”; and [Fig. 1.]: “The discriminator is trained to distinguish between embeddings and samples coming from the spherical uniform prior (blue path). Finally, the membership degree of the embeddings (yellow path) is averaged with the discriminator in order to compute the loss and update the encoder.”; and [page 2]: The two training steps are then repeated iteratively until convergence.” Note: Repeating the steps iteratively will have the discriminator further distinguish further values.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the discriminator of Grattarola to improve autoencoder tasks (Grattarola, Abstract). Bhalodia and Grattarola are analogous art because they born concern variational autoencoders.
Bhalodia does not appear to explicitly teach a second trained classifier that distinguishes between a fifth value sampled from the second group of latent variables [using the trained prior network] and a sixth value for the second group of latent variables generated by the trained encoder network.
However, Wang teaches a second trained classifier that distinguishes between a fifth value sampled from the second group of latent variables [using the trained prior network] and a sixth value for the second group of latent variables generated by the trained encoder network. (Wang [0073]: “Since codes 204A (za) (fourth value) and 204B (zb) (sixth value) are generated by encoders 120A (Ea) and 120B (Eb) separately, a cycle consistency constraint can be used to allocate them into the same latent space z.”; [0073]; and “In some cases, each component of code 204A (za) in latent space z can be conditionally independent and can have a Gaussian distribution (e.g., N(0, I)) with unit variance. Moreover, the code 204A (za), which can be randomly sampled from latent space z, can be used by generator 122B (Gb) to reconstruct the input image 202 (Xa).”; and [0078]: “In some examples, the loss functions for the VAEs can include a component that penalizes the deviation of the distribution of codes (e.g., 204A, 204B) in the latent space from the prior distribution, which can be a zero mean Gaussian, n˜N(0, I) (third and fifth value), and a component that penalizes the reconstruction loss between the source image (e.g., 202) and the one generated by the corresponding generator (e.g., 122A or 122B).”; Note: See Figure 2 of Wang to see that discriminator 124B is the first classifier that takes in 204A (za) (fourth value) and N(0, I) (third value). Discriminator 124A is the second classifier that takes in 204B (zb) (sixth value) and N(0, I) (fifth value).)
It would have been obvious before the effective filing date to combine the latent correction of Bhalodia with the second classifier of Wang to improve image generation (Wang, [0035]). Bhalodia and Wang are analogous art because they both concern image generation via variational autoencoders.
Dependent claim 18 is claim 8 in the form of a non-transitory computer readable medium and is rejected for the same reasons as claim 8 stated above. For the rejection of the limitations specifically pertaining to the non-transitory computer readable medium of claim 14, see the rejection of claim 14 above.
Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Bhalodia, Grattarola, Burda, Wang and Liu et al. (SENet for Weakly-Supervised Relation Extraction); hereinafter Liu
Claim 20 is rejected over Bhalodia, Grattarola, Burda, Wang and Liu with the incorporation of claim 14.
Regarding claim 20, Bhalodia does not appear to explicitly teach wherein the one or more residual blocks comprise a first batch normalization layer with a first Swish activation function, a first convolutional layer following the first batch normalization layer with the first Swish activation function, a second batch normalization layer with a second Swish activation function, a second convolutional layer following the second batch normalization layer with the second Swish activation function, and a squeeze and excitation layer.
However, Liu teaches wherein the one or more residual blocks comprise a first batch normalization layer with a first Swish activation function, a first convolutional layer following the first batch normalization layer with the first Swish activation function, a second batch normalization layer with a second Swish activation function, a second convolutional layer following the second batch normalization layer with the second Swish activation function, and a squeeze and excitation layer. (Liu [page 514, 3.3 Model Performance and Comparison]: “SE blocks are sufficiently flexible to be used in residual networks. The black frame in Figure2 depicts the schema of an SE-ResNet module. We apply Batch Normalization (BN) [13] after each convolution layer. Squeeze and excitation both act before summation with the identity branch.”; page 513, 2.3 SE-ResNet Module; and “we design the architecture of our model using grid search and find that with four SE-ResNet blocks (SE = squeeze and excitation), we could improve the performance of learning in a noisy input setting.) Performance will not get better if we use more SE-ResNet blocks due to the limit size of the NYT dataset. We also show the effect of Swish activation function in Figure 5. Swish is a slightly better than ReLU. Figure 6 shows the effect of SE module and double pooling. The reason of SE-ResNet-D help this task is in three aspects. First, SE module improve the representational capacity of a network by enabling it to perform dynamic channelwise feature recalibration and prevent overfitting. Second, multiple layers of convolution with identity shortcut and SE module extract multi-scale information including hidden lexical, syntactic and semantic representations of a sentence”; Note: Multiple layers of convolution will include the first and second layers.)
It would have been obvious before the effective filing date to combine the image generation using the VAEs of Bhalodia with the ResNet structure of Liu to improve neural architecture performance (Liu, page 514). Bhalodia and Liu are analogous art because they both concern encoding data into a latent representation
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DAVID H TRAN whose telephone number is (703)756-1525. The examiner can normally be reached M-F 9:30 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Viker Lamardo can be reached at (571) 270-5871. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DAVID H TRAN/ Examiner, Art Unit 2147
/ERIC NILSSON/Primary Examiner, Art Unit 2151