Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner notes the entry of the following papers:
Amended claims filed 9/26/2025.
Applicant arguments/remarks made in amendment filed 9/26/2025.
Claims 1-2, 4-7, 9-15, 17-19, and 24-25 are amended. Claims 20- 21 are cancelled. Claims 1-19, and 22-25 are presented for examination.
Response to Arguments
Applicant presents arguments. Each is addressed.
Applicant argues that “Applicant therefore respectfully requests withdrawal of the pending rejection of the claims under 35 U.S.C. § 101.” (Remarks, page 9, paragraph 1, line 1.) Examiner notes the amended claims and agrees. The rejections under 35 U.S.C. § 101 are withdrawn.
Applicant argues “Applicant therefore respectfully submits that Isola, Herzog, and Gatys individually or in combination, do not render claim 1 as amended obvious under 35 U.S.C. § 103.” (Remarks, page 10, paragraph 2, line 1.) The argument is moot in view of new grounds of rejection necessitated by amendment.
Applicant argues “…claims 14 and 24 are allowable at least for reasons including some of those discussed above in connection with claim 1.” (Remarks, page 10, paragraph 3, line 1.) The argument is moot in view of new grounds of rejection necessitated by amendment.
Applicant argues “Accordingly, Applicant respectfully submits that claims 2-4, 10, 12, 15-17, 22, 23, and 25 are allowable at least for depending from an allowable independent claim.” (Remarks, page 10, paragraph 5, line 2.) However, the independent claims remain rejected. The dependent claims remain rejected, at least for depending from rejected base claims.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 4-11, 13-15, 17-19, and 24-25 are rejected under 35 U.S.C. § 103 as being unpatentable over Liu, et al (Coupled Generative Adversarial Networks, herein Liu), and Larsen, et al (Autoencoding beyond pixels using a learned similarity metric, herein Larsen)
Regarding claim 1,
Liu teaches a computer-implemented method (Liu, Figure 1, and page 1, paragraph 1, line 1 “We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images… It can learn a joint distribution with just samples drawn from the marginal distributions.” And, page 1, paragraph 2, line 1 “The paper concerns the problem of learning a joint distribution of multi-domain images from data.” And paragraph, 4, line 1 “CoGAN consists of a tuple of GANs, each for one image domain….We show that by enforcing a weight-sharing constraint the CoGAN can learn a joint distribution without existence of corresponding images in different domains.”
PNG
media_image1.png
441
1021
media_image1.png
Greyscale
In other words, CoGAN is a computer implemented method.), comprising:
Liu teaches a first domain and a second domain. (Liu, page 1, paragraph 4, line 1 “CoGAN consists of a tuple of GANS, each for one image domain.” And, page 1, paragraph 4, line 8 “CoGAN is for multi-image domains but, for ease of presentation, we focused on the case of two image domains in the paper. However, the discussions and analyses can be easily generalized to multiple image domains.” In other words, two image domains are a first domain and a second domain.)
[using a first encoder neural network to encode one or more first images of a first domain into a first latent code of a shared latent space, using a second encoder neural network to encode one or more second images of a second domain into a second latent code of the shared latent space], wherein
Liu teaches the second encoder neural network shares one or more weights with the first encoder neural network (Liu, page 1, paragraph 4, line 5 “By enforcing the layers that decode high-level semantics in the GANs to share the weights, it forces the GANs to decode the high-level semantics in the same way. The layers that decode low-level details then map the shared representation to image in individual domains for confusing the respective discriminative models.” In other words, share the weights is shares one or more weights, and, the layers that decode high-level semantics in the GANs to share the weights is the second encoder neural network shares one or more weight with the first encoder neural network.); and
[using a generator neural network to generate one or more translated images depicting one or more features of the one or more first images represented in the second domain based, at least in part, on the first latent code and the second latent code.]
Thus far, Liu does not explicitly teach using a first encoder neural network to encode one or more first images of a first domain into a first latent code of a shared latent space, using a second encoder neural network to encode one or more second images of a second domain into a second latent code of the shared latent space.
Larsen teaches a first encoder neural network to encode one or more first images of a first domain into a first latent code of a shared latent space, producing a first latent code of the shared latent space (Larsen, page 2, column 1, paragraph 1, line 1“We combine VAEs and GANs into an unsupervised generative model that simultaneously learns to encode, generate and compare dataset samples.” And, page 2, column 1, paragraph 3, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space, respectively:
PNG
media_image2.png
28
525
media_image2.png
Greyscale
” In other words, encode is encoding, two networks (the first encodes) is first neural network in a first domain, encode a data sample x to a latent representation z is convert the first image to a shared latent space, and latent representation z is a first latent code.);
Larsen teaches using a second encoder neural network to encode one or more images of a second domain into a second latent code of the shared latent space (Larsen, page 2, column 1, paragraph 1, line 1“We combine VAEs and GANs into an unsupervised generative model that simultaneously learns to encode, generate and compare dataset samples.” And, page 2, column 1, paragraph 3, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space, respectively:
PNG
media_image2.png
28
525
media_image2.png
Greyscale
” In other words, encode is encoding, two networks (the second encodes) is second neural network in a second domain, encode a data sample x to a latent representation z is convert the second image to a shared latent space, and latent representation z is a second latent code. Doing the same steps again, is encoding by a second network, a second image to a shared latent space, producing a second latent code z.)
Larsen teaches using a generator neural network to generate one or more translated images depicting one or more features of the one or more first images represented in the second domain based, at least in part, on the first latent code and the second latent code. (Larsen, page 2, column 1, paragraph 5, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space respectively:
PNG
media_image3.png
29
579
media_image3.png
Greyscale
” In other words, decoder is third neural network, decode is generating wherein the first translated image is correlated with the first image and weight values of the third neural network are computed based on the first latent code.)
In order to motivate the combination of Liu and Larsen, a brief description of the underlying technology is necessary. A GAN (generative adversarial network) is a machine learning model in which two neural networks, called the generator and discriminator, respectively, compete with each other to become more accurate in their predictions. The generator, learns to generate plausible data. The generated instances become negative training examples for the discriminator. The discriminator learns to distinguish the generator’s fake data from real data. For example, in the case of images, this means discriminating between an image generated by the generator and an actual photograph. The discriminator penalizes the generator for producing implausible results.
Liu combines two GANS into a coupled generative adversarial network (CoGAN) to learn a joint distribution between two or more domains. Liu does this to allow for unsupervised learning of a joint domain by enforcing the layers that decode high-level semantics in the GANs to share weights, thereby forcing the GANs to decode the high-level semantics in the same way. (See, Figure 1 of Liu.)
An autoencoder is a neural network that combines an encoder and a decoder. The encoder maps the input into a latent code and the decoder takes the latent code and produces an output that maps back to the input. The goal of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-dimensional data, typically for dimensionality reduction, by training the network to capture the most important parts of the input. The loss function used to train an autoencoder is called reconstruction loss, as it is a check of how well the image has been reconstructed from the input.
A Variational Autoencoder (VAE) performs the same function as the autoencoder, but instead of the encoder’s output being a latent vector, the encoder outputs the mean and the standard deviation for each latent variable. It does this in order to normalize the output and remove outliers. The loss function is the reconstruction loss, as with a typical autoencoder, combined with a similarity loss.
PNG
media_image4.png
353
682
media_image4.png
Greyscale
Larsen teaches a combination of VAE and GAN called a VAE/GAN. The VAE/GAN encodes the input with the encoder into the latent space, and then decodes the latent space to the reconstructed input. The reconstructed input is then used as the generator in the GAN where the reconstructed input is compared to a real image by the discriminator. In other words, when the VAE encoder-decoder is combined with the GAN generator-discriminator, the decoder of the VAE becomes the generator of the GAN. The loss is then promulgated back to the encoder. “The end result will be a method that combines the advantage of GAN as a high quality generative model and VAE as a method that produces an encoder of data into the latent space z.” (Larson, page 2, column 2, paragraph 3, line 5.)
PNG
media_image5.png
263
459
media_image5.png
Greyscale
The combined VAE/GAN simultaneously learns to encode, generate, and compare dataset samples. The VAE/GAN replaces element-wise reconstruction errors with feature-wise errors for measuring reconstruction quality during training.
The claimed invention is a combination of VAE/GANs from Larsen combined with coupled GANs from Liu. The result is coupled VAE/GANs for unsupervised image-to-image translation (See Fig. 2C)
PNG
media_image6.png
673
437
media_image6.png
Greyscale
Both Liu and Larson are directed to image to image translation among other things. Liu teaches using two sets of GANS, aka coupled generative adversarial networks (GANs), to learn a joint distribution of multi-domain images by enforcing layers to share weights in order to make a shared latent space. But Liu does not teach combining VAEs with GANs for the purpose of learning feature representations in the GAN discriminator for reconstruction instead of element-wise representations. Larson teaches combining VAEs with GANs to learn feature-wise errors instead of element-wise errors allowing for an embedding in which high-level abstract visual features can be modified using simple arithmetic. However, Larsen does not teach learning a joint distribution of multiple domains from data. In view of the teaching of Liu, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Larsen into Liu. This would result in using VAE-GANS that are coupled to improve the quality of image translation between domains using unsupervised data.
One of ordinary skill in the art would be motivated to do this because the problems of element-wise distance metrics have been a long-standing problem in the field. (Larsen, page 6, column 2, paragraph 3, lie 1 “The problems of element-wise distance metrics are well known in the literature and many attempts have been made at going beyond pixels – typically using hand-engineered measures. Much in the spirit of deep learning, we argue that the similarity measure is yet another component which can be replaced by a learned model capable of capturing high-level structure relevant to the data distribution. In this work, our main contribution is an unsupervised scheme for learning and applying such a distance measure.”)
Regarding claim 2,
The combination of Liu and Larsen teaches the method of claim 1, wherein
transforming the one or more first images comprises
decoding, from the shared latent space, the first latent code(Larson, Figure 1, and page 2, column 1, paragraph 3, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space, respectively:
PNG
media_image2.png
28
525
media_image2.png
Greyscale
” Examiner notes that an Encoder necessarily creates a latent space. In other words, decoder is decoding the first latent code, and z is latent code from the shared latent space.) .
Regarding claim 4,
The combination of Liu, and Larsen teaches the method of claim 1, wherein
the generator neural network includes a generator adversarial network (Liu, Figure 1, and, page 1, paragraph 4, line 5 “By enforcing the layers that decode high-level semantics in the GANs to share the weights, it forces the GANs to decode the high-level semantics in the same way. The layers that decode low-level details then map the shared representation to image in individual domains for confusing the respective discriminative models.” In other words, GAN is a generator adversarial network.)
Regarding claim 5,
The combination of Liu, and Larsen teaches the method of claim 1, wherein
the first and second encoder neural networks are separate encoders that share weights (Liu, Figure 1, and page 1, paragraph 1, line 1 “We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images… It can learn a joint distribution with just samples drawn from the marginal distributions.” And, page 1, paragraph 2, line 1 “The paper concerns the problem of learning a joint distribution of multi-domain images from data.” And paragraph, 4, line 1 “CoGAN consists of a tuple of GANs, each for one image domain….We show that by enforcing a weight-sharing constraint the CoGAN can learn a joint distribution without existence of corresponding images in different domains.”
In other words, GAN1 and GAN2 include a first encoder and a second encoder, each separate from the other, that share weights.)
Regarding claim 6,
The combination of Liu, and Larsen teaches the method of claim 1, wherein
the first and second encoder neural networks share the one or more weights of one or more layers (Liu, See mapping of claim 5. Figure 1. In other words, CoGAN has a first encoder and a second encoder, and, from Figure 1, the first encoder and second encoder share weights from one or more layers.).
Regarding claim 7,
The combination of Liu, and Larsen teaches the method of claim 1, wherein
the first and second encoder neural networks share the one or more weights, and wherein the generator neural network includes a generator network that uses one or more outputs of the first and second encoders to transform the one or more first images (Liu, See mapping of claim 5. Figure 1. In other words, CoGAN has two encoders, the first encoder and the second encoder share weights, two GANs is one or more neural networks, each include a generator network, and, the one or more neural networks uses one or more outputs to the first and second encoders to transform the one or more first images.).
Regarding claim 8,
The combination of Liu, and Larsen teaches the method of claim 1, wherein transforming the one or more first images comprises
converting the one or more first images from a first setting to a second setting based on shared weights of the first and second encoders (Liu, Figure 1. “Each has a generative model for synthesizing realistic images in one domain and a discriminative model for classifying whether an image is real or synthesized. We tie the weights of the first few layers (responsible for decoding high-level semantics) of the generative models, g1 and g2.” And, page 1, Abstract, line 11 “We also demonstrate its applications to domain adaptation and image transformation.” In other words, synthesizing is converting, image is image, tie the weights of the first few layers is shared weights of two encoders, and domain adaptation and image transformation is converting the one or more first images from a first setting to a second setting based on the shared weights of the first and second encoders.).
Regarding claim 9,
The combination of Liu, and Larsen teaches the method of claim 1, wherein the first and second encoder neural networks share the one or more weights (Liu, Figure 1. See mapping of claim 5.), and wherein
the generator neural network includes a generator network that is to use the one or more features encoded from the one or more first images and the one or more features encoded from the one or more second images to translate the one or more first images from one setting to another setting (Liu, Figure 1, See mapping of claims 5 and 8. In other words, CoGAN is one or more neural networks, Generator is a generator network to use the one or more features encoded from the one or more first and second images, and domain adaptation and image transformation is translate the one or more images from one setting to another setting.)
Regarding claim 10,
The combination of Liu, and Larsen teaches the method of claim 1, wherein
the first encoder neural network is different from the second encoder neural network (Liu, Figure 1. In other words, the first encoder neural network g1 is different from the second encoder neural network g2.).
Regarding claim 11,
The combination of Liu, and Larsen teaches the method of claim 1, wherein
the first and second encoder neural networks are to generate the first and second latent codes, and wherein the first and second latent codes are equal (Larsen, page 2, column 1, paragraph 5, line 1 “A VAE consists of two networks that encode a data sample x to a latent representation z and decode the latent representation back to data space, respectively:
PNG
media_image7.png
25
437
media_image7.png
Greyscale
In other words, two networks is one or more neural networks, each encode a data sample to a latent representation is the first and second encoder neural networks generate a first and second latent code where the codes are equal.)
Regarding claim 13,
The combination of Liu, and Larsen teaches the method of claim 1, wherein transforming the one or more first images comprises
translating from a first domain to a second domain, wherein the first domain is synthetic and the second domain is real (Liu, Figure 1, “Each has a generative model for synthesizing realistic images in one domain and discriminative model for classifying whether an image is real or synthesized.” In other words, synthesizing is translating from a first domain to a second domain, synthesized is synthetic, and real is real.).
Claims 14-15, and 17 are a system comprising one or more processors claims that correspond to method claims 1-2, and 4, respectively. Otherwise, they are the same. The combination of Liu and Larsen teaches a system comprising one or more processors (Larsen, page 8, column 1, paragraph 5, line 1 “We would like to thank our reviewers for useful feedback, Søren Hauberg, Casper Kaae Sønderby and Lars Maaløe for insightful discussions, Nvidia for donating GPUs used in experiments, and the authors of DeepPy3 and CUDArray
(Larsen, 2014) for the software frameworks used to implement our model.” In other words, framework is system, and GPU is one or more processors.) Therefore, claims 14-15, and 17 are rejected for the same reasons as claims 1-2, and 4, respectively.
Claims 18 and 19 are system claims that correspond to method claims 9 and 6, respectively. Otherwise, they are the same. Therefore, claims 18 and 19 are rejected for the same reasons as claims 9 and 6, respectively.
Claims 24 and 25 are one or more processors, comprising circuitry claims corresponding to method claims 1 and 2, respectively. The combination of Liu and Larson Isola, Herzog, and Gatys teaches one or more processors comprising circuity (Larsen, page 8, column 1, paragraph 5, line 1 “We would like to thank our reviewers for useful feedback, Søren Hauberg, Casper Kaae Sønderby and Lars Maaløe for insightful discussions, Nvidia for donating GPUs used in experiments, and the authors of DeepPy3 and CUDArray (Larsen, 2014) for the software frameworks used to implement our model.” In other words, GPU is one or more processors comprising circuitry.) Otherwise, they are the same. Therefore, claims 24 and 25 are rejected for the same reasons as claims 1 and 2, respectively.
Claims 3, 12, 16, and 22-23 are rejected under 35 U.S.C. § 103 as being unpatentable over Liu, Larsen, and Gatys, et al (Image Style Transfer Using Convolutional Neural Networks, herein Gatys).
Regarding claim 3,
The combination of Liu, and Larsen teaches the method of claim 1, wherein transforming the one or more first images comprises
Thus far, the combination of Liu and Larsen does not explicitly teach translating the one or more first images from a first domain to a second domain.
Gatys teaches translating the one or more first images from a first domain to a second domain (Gatys, Figure 7, and, page, 2420, column 2, paragraph 2, line 1 “Thus far the focus of this paper was on artistic style transfer. In general though, the algorithm can transfer the style between arbitrary images. As an example, we transfer the style of a photograph of New York by night onto an image of London in daytime (Fig 7).”
PNG
media_image8.png
491
443
media_image8.png
Greyscale
In other words, London by day is one domain and New York by night is a second domain.)
Both Gatys and the combination of Liu and Larsen are directed to image translation, among other things. The combination of Liu and Larsen teaches the computer implemented method of claim 1, but does not explicitly teach to include translating the one or more first images from a first domain to a second domain. Gatys teaches to include translating the one or more first images from a first domain to a second domain.
In view of the teaching of the combination of Liu and Larsen, it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gatys into the combination of Liu and Larsen. This would result in the computer-implemented method of claim 1, to include translating the one or more first images from a first domain to a second domain.
One of ordinary skill in the art would be motivated to do this because being able to simulate human vision would give computers the ability to compare two separate images and transform one or both. (Gatys, page 2421, column 2, paragraph 2, line 1 “Nevertheless, we find it truly fascinating that a neural system, which is trained to perform one of the core computational tasks of biological vision, automatically learns image representations that allow – at least to some extent – the separation of image content from style.”)
Regarding claim 12,
The combination of Liu, Larsen, and Gatys teaches the method of claim 1, wherein transforming the one or more first images comprises
translating from a first domain to a second domain, and wherein the first domain is day time and the second domain is night time. (Gatys, Figure 7, and page, 2420, column 2, paragraph 2, line 1 “Thus far the focus of this paper was on artistic style transfer. In general though, the algorithm can transfer the style between arbitrary images. As an example, we transfer the style of a photograph of New York by night onto an image of London in daytime (Fig 7).” In other words, London by day is day domain and New York by night is night domain.)
Claim 16 is a system claim corresponding to method claim 3. Otherwise, they are the same. Therefore, claim 16 is rejected for the same reasons as claim 3.
Regarding claim 22,
The combination of Liu, Larsen, and Gatys teaches the method of claim 1, wherein transforming the one or more first images comprises
transforming the one or more first images to include the one or more features of the one or more second images (Gatys, Figure 1, and page, 2420, column 2, paragraph 2, line 1 “Thus far the focus of this paper was on artistic style transfer. In general though, the algorithm can transfer the style between arbitrary images. As an example, we transfer the style of a photograph of New York by night onto an image of London in daytime (Fig 7).” In other words, London by day is day domain and New York by night is night domain, and transferring the style of a photograph of New York by night onto an image of London in daytime is transforming the one or more first images to include the one or more features of the one or more second images.)
Regarding claim 23,
The combination of Liu, Larsen, and Gatys teaches the system of claim 14, wherein transforming the one or more first images further comprises
translating the one or more first images from a first domain to a second domain, and wherein the one or more features of the one or more second images are of the second domain (Gatys, Figure 1, and page, 2420, column 2, paragraph 2, line 1 “Thus far the focus of this paper was on artistic style transfer. In general though, the algorithm can transfer the style between arbitrary images. As an example, we transfer the style of a photograph of New York by night onto an image of London in daytime (Fig 7).” In other words, London by day is first image, day is first domain, New York by night is second image, night is second domain, and transferring the style of a photograph of New York by night onto an image of London in daytime is transforming the one or more first images to include the one or more features of the second image which are from the second domain.).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359. The examiner can normally be reached Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.I.R./Examiner, Art Unit 2124
/MIRANDA M HUANG/ Supervisory Patent Examiner, Art Unit 2124