Office Action Analysis: 17610004 — METHOD AND SYSTEM FOR TRAINING A MODEL FOR IMAGE GENERATION

Examiner Intelligence

REYES, MARIELA D View full profile →
Grants 61% of resolved cases
Career Allowance Rate
208 granted / 342 resolved
+5.8% vs TC avg
Strong +23% interview lift
Without
With
+23.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 4m
Avg Prosecution
5 currently pending
Career history
358
Total Applications
across all art units
Statute-Specific Performance

§101
7.2%
-32.8% vs TC avg
§103
78.2%
+38.2% vs TC avg
§102
10.7%
-29.3% vs TC avg
§112
2.5%
-37.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 342 resolved cases
Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
 	The following is in response to the amendment field on July 29, 2025. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 16-32 are rejected under 35 U.S.C. 103.

Claims 16-18, 22, 24, 28, 29, 31, and 32 are rejected under 35 U.S.C. 103 as being unpatentable over Rosca et al. (“Variational Approaches for Auto-Encoding Generative Adversarial Networks”; hereinafter “Rosca”) in view of Choi et al. (“Dynamic Scene Prediction with Multiple Interacting Agents”—US 20180124423 A1; hereinafter “Choi”) and Bhattacharyya et al. (“Accurate and Diverse Sampling of Sequences based on a "Best of Many" Sample Objective”—hereinafter “Bhattacharyya”).

Regarding claim 16
	Regarding claim 16, Rosca teaches A method of training a model for image generation, the model comprising a hybrid variational auto-encoder (VAE) - generative adversarial network (GAN) framework (Rosca, pg. 4, section 3, first paragraph, “GANs and VAEs have given us useful tools for learning and inference in generative models and we now use these tools to build new hybrid inference methods. The VAE forms our generic starting point, and we will gradually transform it to be more GAN-like”; fourth paragraph, “We choose a zero-mean Laplace distribution                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            |
                            z
                            )
                            ∝
                            e
                            x
                            p
                            ⁡
                            (
                            -
                            λ
                            
                                    |
                                    
                                            x
                                            -
                                            
                                                    G
                                                
                                                    θ
                                                
                                                    z
                                                
                                    |
                                
                                    1
                                
                    ) with scale parameter λ, which corresponds to using a variational auto-encoder with an L1 reconstruction loss; this is a highly popular choice and used in many related auto-encoder GAN variants, such as AGE, BEGAN, cycle GAN and PPGN; seventh paragraph, “We are required to build four networks: the classifier Dφ(x) is trained to discriminate between reconstructions from an auto-encoder and real data points; a second classifier is trained to discriminate between latent samples produced by the encoder and samples from a standard Gaussian; we must implement the deep generative model Gθ(z), and also the encoder network qη(z|x), which can be implemented using any type of deep network”) and the method comprising the steps of: a – (Rosca, pg. 7, section 6, first paragraph, “To better understand the importance of autoencoder based methods in the GAN landscape, we implemented and compared the proposed α-GAN with another hybrid model, AGE, as well as pure GAN variants such as DCGAN and WGAN-GP, across three datasets: ColorMNIST [28], CelebA [25] and CIFAR-10 [23]” (all three datasets are image datasets); pg. 6, fig. 1c shows the architecture of the proposed α-GAN framework where input images xreal are transformed by the encoder of the VAE-GAN into latent variables, which are then transformed back into images by the decoder/generator; pp. 8-9, figs. 3, 4, and 6 show the output images after inputting images from the respective datasets; Examiner notes that Rosca does not explicitly teach the limitation of multiple inputs of an input image and multiple distinct output images), and c - train the model based on a predefined training objective, the predefined training objective integrating the best-of-many sample reconstruction cost and a GAN-based synthetic likelihood term (Rosca, pg. 5, equation 9, generator loss combines likelihood (i.e. negative reconstruction loss) term                         
                            λ
                            
                                    |
                                    
                                            x
                                            -
                                            
                                                    G
                                                
                                                    θ
                                                
                                                    z
                                                
                                    |
                                
                                    1
                                
                    , explained in pg. 4 as “[corresponding] to using a variational auto-encoder with an L1 reconstruction loss”, with synthetic likelihood classifier term                         
                            -
                            
                                    log
                                
                                ⁡
                                
                                                    D
                                                
                                                    ϕ
                                                
                                                            G
                                                        
                                                            θ
                                                        
                                                            z
                                                        
                            +
                            l
                            o
                            g
                            ⁡
                            (
                            1
                            -
                            
                                    D
                                
                                    ϕ
                                
                                            G
                                        
                                            θ
                                        
                                            z
                                        
                            )
                        
                    ; training objective is to minimize this combined loss value).
	Rosca does not teach the limitation a - multiple input . However, Choi teaches this limitation (Choi, paragraph [0020], “Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106. Toward this end, block 202 employs a conditional variational auto-encoder (CVAE) framework to learn the sampling model”; the same image input into the CVAE generates variations of an output image (prediction samples are images as explained in paragraph [0042])).
	Rosca and Choi are both considered analogous to the claimed invention since they utilize a variational autoencoder (VAE) for the purposes of image generation. It would have been obvious to a person having ordinary skill in the art (hereinafter “PHOSITA”), before the effective filing date of the claimed invention, to incorporate the methodologies of Choi into those of Rosca. The motivation to do so would be to generate “a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories [samples]” (Choi, paragraph [0020]).
	The combination of Rosca and Choi does not teach the limitations of b - determine the best of the multiple output image samples as a best-of-many sample, the best-of-many sample having the minimum reconstruction cost and wherein the model is a deep neural network or comprises at least one deep neural network. 
 Bhattacharyya teaches:
 b - determine the best of the multiple output image samples as a best-of-many sample, the best-of-many sample having the minimum reconstruction cost (Bhattacharyya, pg. 8488, left col, first and second paragraphs, “we use a “Best of Many Samples” approximation                         
                            
                                            L
                                        
                                        ^
                                    
                                    B
                                    M
                                    S
                                
                     of [equation] (6)… Similar to (6), this objective encourages diversity and loosens the constrains on the recognition network                         
                            
                                    q
                                
                                    ϕ
                                
                     as only the best sample is considered”; pg. 8489, figs. 4 and 5 show composite images of best of many samples of digits from the MNIST handwritten digits dataset).
wherein the model is a deep neural network or comprises at least one deep neural network. (Bhattacharyya, Page 2 Section 3, teaches using and training deep conditional generative models)
	Bhattacharyya is considered analogous to the claimed invention since it utilizes a variational autoencoder (VAE) for the purposes of image generation. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Bhattacharyya into those of Rosca and Choi. Minimizing reconstruction cost and retrieving the products of those minimized costs is a known technique that is applied to VAEs, and thus falls under the “applying a known technique to a known device ready for improvement” category of MPEP 2141(III).

Regarding claim 17
	Regarding claim 17, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16 but does not teach the limitation wherein the model is trained by using only the best-of-many sample for training the model and by disregarding the further multiple output image samples. Bhattacharyya specifically teaches this limitation (Bhattacharyya, pg. 8488, left col, first and second paragraphs, “we use a “Best of Many Samples” approximation                         
                            
                                            L
                                        
                                        ^
                                    
                                    B
                                    M
                                    S
                                
                     of [equation] (6)… Similar to (6), this objective encourages diversity and loosens the constrains on the recognition network                         
                            
                                    q
                                
                                    ϕ
                                
                     as only the best sample is considered”; the authors explain their training strategy: the “best of many sample” is encoded as a variable z drawn from an encoding distribution                         
                            
                                    q
                                
                                    ϕ
                                
                     (also called a “recognition network” re-parameterized from encoding distribution p) of a conditional variational autoencoder, and only the latent sample with the highest probability (i.e., the best sample) is considered for training; the image y can then be passed to a GAN for training). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Bhattacharyya into those of Rosca and Choi. Using a best of many sample log-likelihood approximation for training instead of the traditional CVAE log-likelihood approximation is considered a variation of the latter prompted by a design incentive that “encourages sample diversity and on the other hand aims to close the gap between the training and testing pipelines” (Bhattacharyya, pg. 8488, section 3.1, first paragraph).

Regarding claim 18
	Regarding claim 18, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16 but does not teach the limitation wherein the model is trained based on the best-of-many sample in relation to the input image according to a predefined VAE objective. Bhattacharyya specifically teaches this limitation (Bhattacharyya, pg. 8487, section 3.1, first paragraph, “We would like to maximize the data log-likelihood                         
                            
                                    p
                                
                                    θ
                                
                                    y
                                
                                    x
                                
                    ”; pg. 8488, equation 8, log-likelihood term is maximized by using sample that yields the greatest probability). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Bhattacharyya into those of Rosca and Choi. Using a best of many sample log-likelihood approximation for training instead of the traditional CVAE log-likelihood approximation is considered a variation of the latter prompted by a design incentive that “encourages sample diversity and on the other hand aims to close the gap between the training and testing pipelines” (Bhattacharyya, pg. 8488, section 3.1, first paragraph).

Regarding claim 22
	Regarding claim 22, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16, wherein the model is trained in step c based on the GAN-based synthetic likelihood term to learn generating sharper images by leveraging a discriminator of the GAN which is jointly trained to distinguish between real and generated images. Rosca specifically teaches this limitation (Rosca, pg. 5, “Improved Techniques” section, second paragraph, “One way to justify the use of samples [latent samples from an encoder and samples from a standard Gaussian] is to apply Jensen’s inequality, that is,                         
                            
                                    log
                                
                                ⁡
                                
                                            p
                                        
                                            θ
                                        
                                            x
                                        
                                    =
                                
                                    log
                                
                                ⁡
                                
                                        ∫
                                        
                                                    p
                                                
                                                    θ
                                                
                                                    x
                                                
                                                    z
                                                
                                            p
                                            
                                                    z
                                                
                                            d
                                            z
                                        
                            ≥
                            
                                    E
                                
                                    p
                                    (
                                    z
                                    )
                                
                                            log
                                        
                                        ⁡
                                        
                                                    p
                                                
                                                    θ
                                                
                                    (
                                    x
                                    |
                                    z
                                    )
                                
                    , and replace this with a synthetic likelihood, as done for reconstructions. Instead of training two separate discriminators, we train a single discriminator which treats samples and reconstructions as fake, and p* [the dataset distribution] as real” (such training behavior will cause the model to generate images that look much more similar in style to the input images)).

Regarding claim 24
	Regarding claim 24, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16, wherein the output image samples are inputted into a discriminator of the GAN which outputs the GAN-based synthetic likelihood term. Specifically, Rosca teaches this limitation (Rosca, pg. 4, first paragraph, “The synthetic likelihood can be estimated using the density ratio trick by training a discriminator to distinguish between samples from the marginal p*(x) [dataset distribution] and the conditional pθ(x|z) [decoder distribution] where z is drawn from qη(z|x) [encoder distribution]” (discriminator is used to produce synthetic likelihood from a sample)).

Regarding claim 28
	Regarding claim 28, Rosca teaches A system for training a model for image generation, the model comprising a hybrid variational auto-encoder (VAE) - generative adversarial network (GAN) framework (Rosca, pg. 4, section 3, first paragraph, “GANs and VAEs have given us useful tools for learning and inference in generative models and we now use these tools to build new hybrid inference methods. The VAE forms our generic starting point, and we will gradually transform it to be more GAN-like”; fourth paragraph, “We choose a zero-mean Laplace distribution                         
                            
                                    p
                                
                                    θ
                                
                            (
                            x
                            |
                            z
                            )
                            ∝
                            e
                            x
                            p
                            ⁡
                            (
                            -
                            λ
                            
                                    |
                                    
                                            x
                                            -
                                            
                                                    G
                                                
                                                    θ
                                                
                                                    z
                                                
                                    |
                                
                                    1
                                
                    ) with scale parameter λ, which corresponds to using a variational auto-encoder with an L1 reconstruction loss; this is a highly popular choice and used in many related auto-encoder GAN variants, such as AGE, BEGAN, cycle GAN and PPGN; seventh paragraph, “We are required to build four networks: the classifier Dφ(x) is trained to discriminate between reconstructions from an auto-encoder and real data points; a second classifier is trained to discriminate between latent samples produced by the encoder and samples from a standard Gaussian; we must implement the deep generative model Gθ(z), and also the encoder network qη(z|x), which can be implemented using any type of deep network”) and the system comprising: a processor programmed to function as: a module A configured for multiple input of an input image into the VAE which outputs in response multiple distinct output image samples (Rosca, pg. 7, section 6, first paragraph, “To better understand the importance of autoencoder based methods in the GAN landscape, we implemented and compared the proposed α-GAN with another hybrid model, AGE, as well as pure GAN variants such as DCGAN and WGAN-GP, across three datasets: ColorMNIST [28], CelebA [25] and CIFAR-10 [23]” (all three datasets are image datasets); pg. 6, fig. 1c shows the architecture of the proposed α-GAN framework where input images xreal are transformed by the encoder of the VAE-GAN into latent variables, which are then transformed back into images by the decoder/generator; pp. 8-9, figs. 3, 4, and 6 show the output images after inputting images from the respective datasets; Examiner notes that Rosca does not explicitly teach the limitation of multiple inputs of an input image and multiple distinct output images), and a module C for training the model based on a predefined training objective, the predefined training objective integrating the best-of-many sample reconstruction cost and a GAN-based synthetic likelihood term (Rosca, pg. 5, equation 9, generator loss combines likelihood (i.e. negative reconstruction loss) term                         
                            λ
                            
                                    |
                                    
                                            x
                                            -
                                            
                                                    G
                                                
                                                    θ
                                                
                                                    z
                                                
                                    |
                                
                                    1
                                
                    , explained in pg. 4 as “[corresponding] to using a variational auto-encoder with an L1 reconstruction loss”, with synthetic likelihood classifier term                         
                            -
                            
                                    log
                                
                                ⁡
                                
                                                    D
                                                
                                                    ϕ
                                                
                                                            G
                                                        
                                                            θ
                                                        
                                                            z
                                                        
                            +
                            l
                            o
                            g
                            ⁡
                            (
                            1
                            -
                            
                                    D
                                
                                    ϕ
                                
                                            G
                                        
                                            θ
                                        
                                            z
                                        
                            )
                        
                    ; training objective is to minimize this combined loss value).
	Rosca does not teach the limitation a - multiple input of an input image into the VAE which outputs in response multiple distinct output image samples. However, Choi teaches this limitation (Choi, paragraph [0020], “Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106. Toward this end, block 202 employs a conditional variational auto-encoder (CVAE) framework to learn the sampling model”; the same image input into the CVAE generates variations of an output image (prediction samples are images as explained in paragraph [0042])).
	Rosca and Choi are both considered analogous to the claimed invention since they utilize a variational autoencoder (VAE) for the purposes of image generation. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Choi into those of Rosca. The motivation to do so would be to generate “a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories [samples]” (Choi, paragraph [0020]).
	The combination of Rosca and Choi does not teach the limitations of b - determine the best of the multiple output image samples as a best-of-many sample, the best-of-many sample having the minimum reconstruction cost and wherein the model is a deep neural network or comprises at least one deep neural network. 
 Bhattacharyya teaches:
 b - determine the best of the multiple output image samples as a best-of-many sample, the best-of-many sample having the minimum reconstruction cost (Bhattacharyya, pg. 8488, left col, first and second paragraphs, “we use a “Best of Many Samples” approximation                         
                            
                                            L
                                        
                                        ^
                                    
                                    B
                                    M
                                    S
                                
                     of [equation] (6)… Similar to (6), this objective encourages diversity and loosens the constrains on the recognition network                         
                            
                                    q
                                
                                    ϕ
                                
                     as only the best sample is considered”; pg. 8489, figs. 4 and 5 show composite images of best of many samples of digits from the MNIST handwritten digits dataset).
wherein the model is a deep neural network or comprises at least one deep neural network. (Bhattacharyya, Page 2 Section 3, teaches using and training deep conditional generative models)
	Bhattacharyya is considered analogous to the claimed invention since it utilizes a variational autoencoder (VAE) for the purposes of image generation. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Bhattacharyya into those of Rosca and Choi. Minimizing reconstruction cost and retrieving the products of those minimized costs is a known technique that is applied to VAEs, and thus falls under the “applying a known technique to a known device ready for improvement” category of MPEP 2141(III).

Regarding claim 31
	Regarding claim 31, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation A computer program comprising instructions for executing the steps of the method according to claim 16, when the program is executed by a computer. At least Choi teaches this limitation (Choi, paragraph [0047], “Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system”). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Choi into those of Rosca. The motivation to do so would be to generate “a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories [samples]” (Choi, paragraph [0020]).

Regarding claim 32
Regarding claim 32, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation A non-transitory recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to claim 16. At least Choi teaches this limitation (Choi, paragraph [0047], “Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system”). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Choi into those of Rosca. The motivation to do so would be to generate “a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories [samples]” (Choi, paragraph [0020]).

Claims 20 and 21
	Claims 20 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Rosca, Choi, and Bhattacharyya as applied to claim 16 above, and further in view of Kingma & Welling (“Auto-Encoding Variational Bayes”; hereinafter “Kingma”) and Makhzani et al. (“Adversarial Autoencoders”; hereinafter “Makhzani”).

Regarding claim 20
	Regarding claim 20, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16 but does not teach wherein the model comprises: a variational auto-encoder (VAE) including a recognition network, and a generative adversarial network (GAN) including a discriminator; and wherein a generator is included with at least one of the VAE or the GAN. However, Kingma teaches the limitation wherein the model comprises: a variational auto-encoder (VAE) including a recognition network (Kingma, pg. 7, second paragraph, “we employed the same encoder (also called recognition network) for the wake-sleep algorithm and the variational auto-encoder” (various other sources equate a VAE encoder with a recognition network)), while Makhzani teaches a generative adversarial network (GAN) including a generator and a discriminator (Makhzani, pg. 2, “Adversarial Autoencoders” section, paragraph 2, “The generator of the adversarial network is also the encoder of the autoencoder q(zlx). The encoder ensures the aggregated posterior distribution can fool the discriminative adversarial network” (Makhzani’s VAE-GAN model comprises a VAE encoder, which they describe as their model’s generator, and which is also equated to a recognition network by many; in addition, the model contains a GAN generator (which is the encoder of their VAE) and a GAN discriminator (the “discriminative adversarial network”)).
Wherein a generator is included with at least one of the VAE or the GAN. (Kingma, Page 7, second paragraph discloses a VAE includes a generator)
	Kingma is considered analogous to the claimed invention since they both utilize VAEs for the purpose of image generation. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Kingma into the methodologies of Rosca, Choi, and Bhattacharyya. Utilizing a VAE which incorporates a recognition network would be considered a simple substitution leading to a predictable result, one of the categories of MPEP 2141(III).
	In addition, Makhzani is considered analogous to the claimed invention since they both utilize a VAE-GAN hybrid model for the purposes of image generation. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Makhzani into the methodologies of Rosca, Choi, and Bhattacharyya. Utilizing a VAE-GAN which incorporates a generator and discriminator would be considered a simple substitution leading to a predictable result, which is a category of MPEP 2141(III).

Regarding claim 21
	Regarding claim 21, dependent on claim 20, the rejection of claim 20 is incorporated. Further, the combination of Rosca, Choi, Bhattacharyya, Kingma, and Makhzani teaches the limitation The method according to claim 20, wherein the variational auto-encoder (VAE) and the generative adversarial network (GAN) share generator in common. Makhzani specifically teaches this limitation (Makhzani, pg. 2, “Adversarial Autoencoders” section, paragraph 2, “The generator of the adversarial network is also the encoder of the autoencoder q(zlx). The encoder ensures the aggregated posterior distribution can fool the discriminative adversarial network” (Makhzani’s VAE-GAN model comprises a VAE encoder, which they describe as the model’s generator; in addition, the model contains a GAN generator (which is the encoder of the VAE)). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Makhzani into the methodologies of Rosca, Choi, and Bhattacharyya. An autoencoder that encodes images into latent space for a downstream decoding operation can take on the role of a GAN generator that performs a similar task, and therefore there is no need to have both elements in a hybrid VAE-GAN model. This is a case of utilizing a known technique to improve similar devices in the same way, further detailed in MPEP 2141(III).

Claim 23
	Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Rosca, Choi, and Bhattacharyya as applied to claim 22 above, and further in view of Makhzani.
	Regarding claim 23, dependent on claim 22, the rejection of claim 22 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 22, while Choi teaches during each training iteration… multiple input of the input image (Choi, paragraph [0020], “Since there can be multiple plausible futures given the same inputs (including images I and past trajectories X), block 202 generates a diverse set of prediction samples Ŷ to provide accurate prediction of future trajectories 106. Toward this end, block 202 employs a conditional variational auto-encoder (CVAE) framework to learn the sampling model”; the same image input into the CVAE generates variations of an output image (prediction samples are images as explained in paragraph [0042]); images are then passed into an encoder, which is often referred to as a recognition network by many (e.g., Kingma in the claim 20 analysis above); process occurs during training, meaning model adjustments are made during training iterations) and Bhattacharyya teaches into a recognition network (Bhattacharyya, pg. 8487, section 3.1, second paragraph, “As described in [19], we can sample the latent variables z from a recognition network                         
                            
                                    q
                                
                                    ϕ
                                
                     using the re-parameterization trick”; the re-parameterized encoding distribution                         
                            
                                    q
                                
                                    ϕ
                                
                            (
                            z
                            |
                            x
                            ,
                            y
                            )
                        
                     uses two input arguments x (the input images) and y (the output given a parameter/argument θ)).
	Rosca, Choi, and Bhattacharyya do not teach the limitations which outputs in response respective regions in a latent space or during each training iteration… generation of respective output image samples in the image space by inputting the respective regions in the latent space into a generator. However, Makhzani teaches the limitation (Makhzani, pg. 2, “Both, the adversarial network and the autoencoder are trained jointly with SGD in two phases – the reconstruction phase and the regularization phase – executed on each mini-batch”; processing of a mini-batch (sometimes referred to as a batch) comprises a single training iteration; pg. 4, fig. 2E depicts regions of latent space as a manifold of MNIST digits; pg. 2, “In several recently proposed model families, pθ(x|z) is specified via a generator (or decoder)”; decoding distribution p, implemented as a generator, takes in latent image data z as input; pg. 7, fig. 5a depicts images generated based on samples taken from the VAE latent space). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the methodologies of Makhzani into those of Rosca, Choi, and Bhattacharyya. Utilizing a VAE-GAN which incorporates a generator and discriminator would be considered a simple substitution leading to a predictable result, which is a category of MPEP 2141(III).
	Examiner notes that the latent distribution of the input image is sampled by, as stated in the first part of claim 23, is defined by the following limitations “multiple input of the input image into a recognition network which outputs in response respective regions in a latent space” and “generation of respective output image samples in the image space by inputting the respective regions in the latent space into a generator”. Examiner further points out that the above references capture these processes occurring every training iteration.

Claim 25
	Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Rosca, Choi, and Bhattacharyya as applied to claim 16 above, and further in view of Tang et al. (“Digital Signal Modulation Classification With Data Augmentation Using Generative Adversarial Nets in Cognitive Radio Networks”; hereinafter “Tang”).
	Regarding claim 25, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16, and Rosca teaches outputs the GAN-based synthetic likelihood term (Rosca, pg. 4, first paragraph, “The synthetic likelihood can be estimated using the density ratio trick by training a discriminator to distinguish between samples from the marginal p*(x) [dataset distribution] and the conditional pθ(x|z) [decoder distribution] where z is drawn from qη(z|x) [encoder distribution]” (discriminator is used to produce synthetic likelihood from a sample)). The combination does not teach the limitation wherein only the least realistic of the multiple output image samples is inputted into a discriminator of the GAN. However, Tang teaches this limitation (Tang, pg. 15718, “Experiment” section, list item 5, “In traditional GAN training method, the minibatch sent to discriminator consists of real images and fake images. We change the minibatch component which only contains all real images or all generated images”; pg. 15719, fig. 11 depicts training of a GAN discriminator occurring in batches of either all real data or all generated data, with no batches containing both real and generated samples; Examiner is interpreting “worst” to indicate image samples that are the least realistic, as suggested by the Specification, and the least realistic images of the training batches would be any that are generated as opposed to those from the real data distribution).
	Tang is considered analogous to the claimed invention since they both involve utilizing a GAN for image generation. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Tang into the methodologies of Rosca, Choi, and Bhattacharyya. The motivation to do so is to “make gradient descent in discriminator smoother and controllable” (Tang, pg. 15718, “Experiment” section, list item 5).

Claims 26, 27, and 30
	Claims 26, 27, and 30 are rejected under 35 U.S.C. 103 as being unpatentable over Rosca, Choi, and Bhattacharyya as applied to claim 16 above, and further in view of Miyato et al. (“Spectral Normalization for Generative Adversarial Networks”; hereinafter “Miyato”).

Regarding claim 26
	Regarding claim 26, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, and Bhattacharyya teaches the limitation The method according to claim 16 but does not teach the limitation wherein the Lipschitz constant of the GAN-based synthetic likelihood term is constrained to be equal to a predetermined value using Spectral Normalization. However, Miyato teaches this limitation (Miyato, pg. 3, equation 7, Lipschitz constant of the discriminator function f on the left-hand side of the inequality becomes constrained by upper bound according to singular values of the weights matrix; pg. 3, section 2.1, second paragraph, “Our spectral normalization normalizes the spectral norm of the weight matrix W so that it satisfies the Lipschitz constraint σ(W) = 1”).
	Miyato is considered analogous to the claimed invention since they both utilize GANs for the purposes of image generation through the tuning of specific statistical parameters. It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Miyato into the methodologies of Rosca, Choi, and Bhattacharyya. The motivation to do so is to “stabilize the training of discriminator networks” (Miyato, pg. 1, third paragraph) through a technique that “does not require intensive tuning… for satisfactory performance” and has a small computational cost (Miyato, pg. 2, first and second bullet points).

Regarding claim 27
	 Regarding claim 27, dependent on claim 26, the rejection of claim 26 is incorporated. Further, the combination of Rosca, Choi, Bhattacharyya, and Miyato teaches The method according to claim 26, wherein the predetermined value is equal to 1. Miyato specifically teaches the limitation wherein the predetermined value is equal to 1 (Miyato, pg. 3, section 2.1, third paragraph, “If we normalize each Wl using (8), we can appeal to the inequality (7) and the fact that                 
                    σ
                    
                                            W
                                        
                                        -
                                    
                                    S
                                    N
                                
                                    W
                                
                    =
                    1
                
             to see that ||f||Lip is bounded from above by 1”). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Miyato into the methodologies of Rosca, Choi, and Bhattacharyya. Normalizing statistical training values to 1 for the purpose of ensuring a smoother training routine is a process performed by Miyato and applicable to GAN development; this, therefore, would be considered “applying a known technique to a known device ready for improvement to yield predictable results”, detailed in MPEP 2141(III).

Regarding claim 30
	Regarding claim 30, dependent on claim 16, the rejection of claim 16 is incorporated. Further, the combination of Rosca, Choi, Bhattacharyya, and Miyato teaches the limitation A system for generating an image sample, comprising one of the trained model of step c of claim 16 and the trained module C of claim 16, wherein the Lipschitz constant of the GAN-based synthetic likelihood term is constrained to be equal to a predetermined value using Spectral Normalization. Miyato specifically teaches the limitation wherein the Lipschitz constant of the GAN-based synthetic likelihood term is constrained to be equal to a predetermined value using Spectral Normalization (Miyato, pg. 3, equation 7, Lipschitz constant of the discriminator function f on the left-hand side of the inequality becomes constrained by upper bound according to singular values of the weights matrix; pg. 3, section 2.1, second paragraph, “Our spectral normalization normalizes the spectral norm of the weight matrix W so that it satisfies the Lipschitz constraint σ(W) = 1”). It would have been obvious to a PHOSITA, before the effective filing date of the claimed invention, to incorporate the teachings of Miyato into the methodologies of Rosca, Choi, and Bhattacharyya. The motivation to do so is to “stabilize the training of discriminator networks” (Miyato, pg. 1, third paragraph) through a technique that “does not require intensive tuning… for satisfactory performance” and has a small computational cost (Miyato, pg. 2, first and second bullet points).

Response to Arguments
Claim Rejections - 35 USC § 112
	The instant amendment to the claims have overcome the 35 USC 112 rejection. 
Claim Rejections - 35 USC § 101
	Applicant’s arguments have been considered and are enough to overcome the 35 USC 101 rejection. 
Claim Rejections - 35 USC § 103
	With respect to claim 16: 
	Applicant argues “Choi does not teach obtaining multiple image samples”
	Examiner respectfully disagrees. Rosca teaches inputting an image into a VAE and outputting distinct output image samples (Page 6, Fig 1C and Fig 3). Rosca is silent about the multiple input and multiple output recited in the claim. Choi is relied upon for teaching multiple inputs and multiple outputs (Paragraph [020]) not that the inputs and outputs are images. Therefore the combination of Rosca and Choi does teach “multiple input of an input image into the VAE which outputs in response multiple distinct output image samples”. 

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARIELA D REYES whose telephone number is (571)270-1006. The examiner can normally be reached Monday-Friday, 7:30 am -5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Wiley can be reached at (571) 272-3923. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Mariela Reyes/Supervisory Patent Examiner, Art Unit 2142
Read full office action
Prosecution Timeline

Nov 09, 2021
Application Filed
Apr 29, 2025
Non-Final Rejection mailed — §101, §103, §112
Jul 29, 2025
Response Filed
Nov 06, 2025
Final Rejection mailed — §101, §103, §112
Jan 06, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/852,105
Patent 12626127
HIGH DIMENSIONAL DENSE TENSOR REPRESENTATION FOR LOG DATA
3y 10m to grant Granted May 12, 2026
17/620,164
Patent 12614067
ROBUST REINFORCEMENT LEARNING FOR CONTINUOUS CONTROL WITH MODEL MISSPECIFICATION
4y 4m to grant Granted Apr 28, 2026
17/515,377
Patent 12547933
METRICS-BASED ON-DEMAND ANOMALY DETECTION
4y 3m to grant Granted Feb 10, 2026
17/330,363
Patent 12518174
SITE-WIDE OPTIMIZATION FOR MIXED REGRESSION MODELS AND MIXED CONTROL VARIABLES
4y 7m to grant Granted Jan 06, 2026
17/504,974
Patent 12481939
SYSTEM AND METHOD FOR RESOURCE FULFILMENT PREDICTION
4y 1m to grant Granted Nov 25, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
61%
Grant Probability
84%
With Interview (+23.4%)
4y 4m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 342 resolved cases by this examiner. Grant probability derived from career allowance rate.
METHOD AND SYSTEM FOR TRAINING A MODEL FOR IMAGE GENERATION

This examiner grants 61% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

METHOD AND SYSTEM FOR TRAINING A MODEL FOR IMAGE GENERATION

This examiner grants 61% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email