Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on December 19, 2025 has been entered.
Remarks
Applicant’s amendments have overcome the previous prior art rejections under § 102. However, upon further consideration, new grounds of rejection have been made, based on a newly applied reference, under § 103 for the claims that were previously rejected under § 102. Applicant’s arguments directed to the previous prior art rejections are moot under the new grounds of rejection.
Applicant’s arguments directed to the § 101 rejections have been fully considered but are not deemed to be persuasive. These arguments are addressed below.
In regards to Step 2A, Prong One, applicant argues:
However, the present claims recite elements that require specific machine implementation and cannot be performed in the human mind. For example, claim 6 recites identifying, using a first encoder of the neural networks, a pose of a first object, depicted within a first image. Claim 6 also recites identifying a pose and a style of one or more second objects using a second encoder of the one or more neural networks. These operations use specialized hardware and computational resources, as described in the Specification. Specification at least at [0051].
The claimed encoders process images to extract content codes representative of visual aspects, such as poses, and style codes representative of appearance, which involves complex data processing that cannot be practically performed by the human mind. Specifically, the multimodal image translation framework utilizes neural networks to encode and decide visual information, leveraging domain-specific style codes and shared content codes to perform transformations across diverse object classes. These operations requires high-speed memory access and parallel computation, which are beyond the capabilities of the human mind to execute efficiently or accurately. Furthermore, the Specification describes that the neural networks are implemented using specialized processor circuits, such as adaptive instance normalization layers and generative adversarial networks (GANs) which further demonstrates that the claims are not directed to a mental process.
(Applicant’s arguments, page 8).
These arguments are not persuasive because present claim 6, for example, does not recite any specific machine implementation other than the generic use of neural networks. For example, claim 6 merely uses phrases such as “using one or more neural networks” or “using a first encoder” that lack additional implementation details of the neural networks beyond the generic invocation of neural networks as tools that can be “used” to perform steps of the claim.
Furthermore, contrary to applicant’s statements, the claim does not recite any specialized hardware. Instead, claim 6, for example, merely recites a “computer-implemented method” that can be implemented on any general-purpose computer that can execute a computer program implementing a software neural network, as opposed to a computer with specialized hardware. The Examiner notes that contrary to applicant’s statements, adaptive instance normalization layers and generative adversarial networks (GANs) are not specialized processor circuits in a hardware sense, but instead broadly covers software components that are run on a general-purpose computer.
The Examiner also points out that in regards to the element of “adaptive instance normalization layers,” claim 6 recites no such element, while claim 12 only recites “parameters to be used in adaptive instance normalization layers.” Therefore, the instant claim does not actually require the use of such layers, much less in any specific manner for the steps of claim 1. Instead, claim 12 does no more than generally linking the use of a judicial exception to a particular technological environment or field of use, namely the technological environment of neural networks with an adaptive instance normalization layer. Therefore, the concept of these particular elements described in the specification is not present in the claims in a way that overcomes the section 101 rejection.
In regards to applicant’s statement about extracting content codes representative of visual aspects, such as poses, and style codes representative of appearance, the instant claim merely recites “identifying” or “determining” such features at a high degree of generality. For example, the claim does not recite an extraction process defined in specific technical terms, but merely recites the identification or determination of these features. Furthermore, features such as a pose are recited at a high degree of generality, such that the “pose of a first object,” for example, is no different from what a human can determine from an image.
The mere involvement of a neural network in phrases such as “using a first encoder” does not specifically define the nature of the codes. Although the specification implies that the “codes” are embeddings in a latent space, no such limitations regarding the technical format of the codes are present in claim 6. Therefore, the codes in the instant claim read on those that can be determined by a human as part of a mental process.
Next, in regards to Step 2A, Prong Two, applicant argues:
When viewed as a whole, the amended claims recite an improvement to the function of a computer, and/or an improvement to a technical field. Specifically, the claims address the challenges of performing multimodal image translation efficiently and accurately using neural networks. For example, independent claim 6 describes a specific process that improves the functioning of a computer by utilizing encoder and decoder networks to extract content codes and style codes, which represent domain-invariant and domain-specific features, respectively. This process enables the generation of diverse and multimodal image translations, as described in the Specification at least at [0027] and [0033]. Thus, amended independent claim 6, as a whole, integrates the judicial exception into a practical application such that the claim is not directed to the judicial exception.
(Applicant’s response, page 9).
These arguments are not persuasive for the following reasons. The Step 2A Prong Two analysis first requires determination of additional elements besides the abstract idea. Here, however, the additional elements are merely recitations such as “using one or more neural networks” and “using a first encoder,” which do not specify additional implementation details of the neural networks beyond the generic invocation of neural networks as tools that can be “used” to perform steps of identification or determination. Therefore, the claims lack technical features that enable or reflect any of the concepts discussed in applicant’s remarks quoted above.
Many concepts that applicant refer to, such as domain-invariant and domain-specific features and the generation of diverse and multimodal image translations, are not present in the claims, much less in a manner that is attributed to specific technical details of the neural network. Furthermore, image generation in the abstract can still be a mental process, since mental process includes those that can be performed using pen and paper. Since the claims merely invoke the use of neural networks as tools to perform an abstract idea, it does not reflect improvements in technology in regards to generation of multimodal images as a practical application.
In regards to Step 2B analysis, applicant argues:
Here, amended independent claim 6 recites multimodal image translation using neural networks that implement specific and unconventional steps, such as inferring content codes representative of visual aspects and style codes representative of appearance styles, and combining these codes to generate diverse and multimodal image translations. For example, the claimed invention utilizes advanced neural network architectures, including adaptive instance normalization (AdaIN) layers and multilayer perceptrons (MLPs), to dynamically adjust normalization parameters based on style codes. This enables precise control over the appearance of generated images, a technique that is neither well-understood, routine, nor conventional in the field, as it confines the claim to a particular useful application of the judicial exception, favoring eligibility under M.P.E.P. §2106.05(d).
(Applicant’s response, page 10)
These arguments are not persuasive for the reasons discussed above, namely that claim 6, for example, merely invokes the use of neural networks as tools to perform a mental process. Furthermore, as discussed above, claims 6 does not recite adaptive instance normalization, while claim 12 only recites “parameters to be used in adaptive instance normalization layers,” without reciting a specific use of such layers. Furthermore, multilayer perceptrons (MLPs) are merely considered to be generic components of neural networks, and the use of them does not constitute an improvement to the field of neural networks.
Therefore, applicant’s arguments directed to the § 101 rejection are not persuasive, and the claims remain rejected under § 101.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 6-26 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to an abstract idea without significantly more.
The claims have been analyzed in accordance with the 2019 Revised Patent Subject Matter Eligibility Guidance and October 2019 Patent Eligibility Guidance Update, which sets forth the following inquiries for determining eligibility.
Independent claims 6, 16, and 21
Step 2A Prong One: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes, independent claims 6, 16, and 21 recite an abstract idea in the form of mental processes. A mental process is a process that “can be performed in the human mind, or by a human using a pen and paper” (MPEP § 2106.04(a)(2)(III), paragraph 1). Examples of mental processes include “observations, evaluations, judgments, and opinions” (MPEP § 2106.04(a)(2)(III), paragraph 2).
The following limitations in claims 6, 16, and 21 are a mental process:
“to generate one or more output images depicting an object based, at least in part, on” [These limitations merely recite image generation at a high degree of generality and do not contain any specific features that would preclude the above limitations from being a mental process. For example, a human can, with pen and paper, generate an image depicting an object based on certain criteria such as a pose of the object and a pose and style of another object in a different image. The generation of images can be performed by a human using observations, evaluations, judgments, and opinions. Therefore, these limitations are a mental process.]
“identifying, […] a pose of a first object, depicted within a first image” [The identification of content in an image is a mental process that can be performed by observation, evaluation, judgment, and opinion. For example, a human is capable of observing an image and identifying the pose of a subject in the image.]
“identifying a pose of the first object, and a pose and a style of one or more second objects depicted within one or more second images, wherein the style of the one or more second objects is identified” [The identification of content in an image is a mental process that can be performed by observation, evaluation, judgment, and opinion. For example, a human is capable of observing an image and identifying the pose of a subject in the image and the style of the image.]
“determining features of the object based on combining features associated with the identified pose of the first object, the identified pose of the one or more second object, and the identified style of the one or more second objects” [This step of “determining” a mental process that can be performed by observation, evaluation, judgment, and opinion. For example, a human is capable of observing various features of images and combine them. Note that the act of “combining” is recited at a high degree of generality, and broadly includes, for example, determining that different features are to be used together.]
Therefore, independent claims 6, 16, and 21 are directed to a judicial exception in the form of a mental process.
Step 2A Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
The judicial exception is not integrated into a practical application.
Independent claims 6, 16, and 21 recite the following additional elements, but these additional elements are not sufficient to integrate the judicial exception into a practical application:
“using/use one or more neural networks”, “using/use a first encoder of the one or more neural networks”, and “using/use a second encoder of the one or more neural networks” (claims 6, 16, and 21) [These elements constitute no more than mere instructions to apply the judicial exception using generic computer components or functions (MPEP § 2106.04(d)(I)), namely neural networks. These elements, which use the term “using/use” merely invoke a neural network as a tool to perform the mental process. The term “encoder” is considered to be referring to generic processing functions of a neural network. “Encoder,” when used in a generic manner, is merely a generic function of neural networks because neural networks are typically encoders on some level; for example, the calculations after the initial input can be regarded as encodings.]
“computer implemented method” (claim 6), “at least one processor having one or more circuits” (claim 16); and “one or more processors comprising: circuitry” (claim 21). [These elements constitute no more than mere instructions to apply the judicial exception using generic computer components (MPEP § 2106.04(d)(I)), namely the generic computer components of a processor and circuitry, which are merely invoked as a tool to perform the mental process.]
Therefore, the above limitations do not integrate the judicial exception into a practical application.
Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
Additional elements that are mere instructions to apply an exception do not constitute significantly more than a judicial exception under MPEP § 2106.05(I)(A). Therefore, those additional elements identified above in the Prong One analysis as mere instructions to apply an exception do not constitute significantly more.
Dependent Claims
The remaining rejected dependent claims do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than the judicial exception.
Dependent claim 7: The limitation of “inferring […] style data of the one or more second objects, the style data of the one or more second objects including style codes corresponding to respective points in a style space, the style space corresponding to a distribution of objects in a class of objects” is considered to be a mental process that can be performed by observations, evaluations, judgments, and opinions. In particular, the Examiner notes that the instant step of “inferring” is recited at a high degree of generality, without any specific technical methodology. The features of “style data,” “style codes,” and “style space” do not have any requisite degree of complexity and are considered to be features that can be used within a mental process. The limitation of the “using a target encoder network” is an additional element, but it does not integrate the abstract idea into practical application or amount to significantly more than an abstract idea because it constitutes mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely a neural network. Here, the neural network is invoked merely as a tool that performs the mental process of inferring the style data. The claim omits any details as to how the neural network solves a technical problem and instead recites only the idea of a solution or outcome, namely inferring the style data.
Dependent claim 8: The limitation of “the features of the object are provided as input to the one or more neural networks” is an additional element, but is mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely neural networks. Here, “…provided as input…” merely refers to the use of the one or more neural networks as a tool that receives the codes as input data. Therefore, these additional elements merely refer to the use of neural networks generically, and do not place any limitations on the abstract idea other than the use of the generic computer components.
Dependent claim 9: The limitation of “inferring […] a style code of the one or more second objects, the style code representing an appearance style of the one or more second objects; and […] by re-constructing an image using the content code and the style code” is a mental process that can be performed by observations, evaluations, judgments, and opinions. The limitations “using a second encoder network” and “performing regularization” are considered to be additional elements, but they do not integrate the abstract idea into a practical application for the following reasons. The limitation of “using a second encoder network” is mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely neural networks. Here, the “second encoder network” is merely invoked as a tool to perform the abstract idea of inferring the content code. Therefore, this element merely refers to the use of neural networks generically, and do not place any limitations on the abstract idea other than the use of the generic computer components. The limitation of “performing regularization” is considered to be extra-solution activity for purposes of Step 2A: Prong One and Step 2B, because it is merely tangential to the claim. For purposes of Step 2B analysis, this limitation is also considered to be “well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception” (MPEP § 2106.05(I)(A)) as evidenced by US 20090106173 A1 ([0003]: “It is well-known that the use of regularization is necessary to achieve a model that generalizes well to unseen data, particularly if the number of parameters is very high relative to the amount of training data.”).
Dependent claim 10: The limitations of “wherein the one or more neural networks have not processed previously-received images including the one or more second objects represented as having the visual aspect” merely further defines the mental process recited in the parent claim, and does not constitute an additional element beyond the mental process. This claim does not recite any non-abstract additional elements for purposes of Step 2A Prong Two and Step 2B analysis.
Dependent claim 11: The limitation of “representing style data of the one or more second objects as affine transformation parameters” is a mental process that can be performed by observations, evaluations, judgments, and opinions. The limitation of “in normalization layers of the one or more neural network” is an additional element, but it does no more than generally linking the use of a judicial exception to a particular technological environment or field of use, namely the technological environment of neural networks with a normalization layer. For example, the claim does not require the normalization layers to have any impact on the process of generating the images. Additional elements that are merely generally linking or generally linking the use of a judicial exception to a particular technological environment or field of use do not constitute significantly more than a judicial exception under MPEP § 2106.05(I)(A).
Dependent claim 12: The limitation of “generating, from style data […] parameters to be used” is a mental process that can be performed by observations, evaluations, judgments, and opinions. The Examiner notes that “generating” in this context broadly and generically covers the determination of parameters. Furthermore, the claim language of “to be used” does not require actual use in any adaptive instance normalization layer. The limitation of “using multilayer perceptrons” is mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely neural networks. This limitation merely refers to the use of a neural networks as a tool that receives the codes as input data. Therefore, these additional elements merely refer to the use of neural networks generically, and do not place any limitations on the abstract idea other than the use of the generic computer components. The limitation of “in adaptive instance normalization layers of the one or more neural networks” is an additional element, but it does no more than generally linking the use of a judicial exception to a particular technological environment or field of use, namely the technological environment of neural networks with an adaptive instance normalization layer. For example, the claim does not require the normalization layers to have any impact on the process of generating the images. Additional elements that are merely generally linking or generally linking the use of a judicial exception to a particular technological environment or field of use do not constitute significantly more than a judicial exception under MPEP § 2106.05(I)(A).
Dependent claim 13: The limitation of “selecting the one or more second objects from a class of objects” is a mental process that can be performed by observations, evaluations, judgments, and opinions. The further limitation of “using random sampling of a multi-variate Gaussian distribution” is an abstract idea in the form of a mathematical concept. Specifically, this limitation recites the use of a mathematical algorithm in the form of random sampling using a certain distribution.
Dependent claim 14: The limitation of “wherein the one or more neural networks is a generative adversarial network (GAN) including a conditional image generator and an adversarial discriminator” is an additional element, but it does no more than generally linking the use of a judicial exception to a particular technological environment or field of use. This limitation merely recites a type of model, along with its standard components. The features of this claim (the components of the GAN) do not perform any specific operation. Therefore, this claim does no more than generally linking the use of a judicial exception to a particular technological environment or field of use, namely the technological environment of neural networks, which include generative adversarial networks (GANs). As noted above, such a limitation does not integrate a judicial exception into a practical application, nor does it amount to significantly more than the judicial exception.
Dependent claim 15: The limitation of “normalizing, […], layer activations to zero mean and unit variance distribution; and de-normalizing the normalized layer activations using an affine transformation” is an abstract idea in the form of a mathematical concept. Specifically, this limitation recites the use of mathematical algorithms of normalization and de-normalization. The limitation of “by a normalization layer of the adversarial discriminator” is an additional element, but it does no more than generally linking the use of a judicial exception to a particular technological environment or field of use, namely the technological environment of a GAN whose discrimination has a normalization layer. For example, the claim does not require the normalization layers to have any impact on the process of generating the images. Additional elements that are merely generally linking or generally linking the use of a judicial exception to a particular technological environment or field of use do not constitute significantly more than a judicial exception under MPEP § 2106.05(I)(A).
Dependent claim 17 recites further limitations that are substantially the same as those of claim 7. Therefore, the analysis for claim 7 is applied to claim 17.
Dependent claim 18: The limitation of “the features of the object are provided as input to the one or more neural networks” is an additional element, but is mere instructions to apply the judicial exception using generic computer functions (MPEP § 2106.04(d)(I)), namely neural networks. Here, “…provided as input…” merely refers to the use of the one or more neural networks as a tool that receives the codes as input data. Therefore, these additional elements merely refer to the use of neural networks generically, and do not place any limitations on the abstract idea other than the use of the generic computer components.
Dependent claims 19-20 recites further limitations that are substantially the same as those of claims 11-12. Therefore, the analysis for claims 11-12 is applied to claims 19-20, respectively.
Dependent claim 22 recites further limitations that are substantially the same as those of claim 7. Therefore, the analysis for claim 7 is applied to claim 17.
Dependent claim 23 recites further limitations that are substantially the same as those of claim 18. Therefore, the analysis for claim 18 is applied to claim 23.
Dependent claim 24 recites further limitations that are substantially the same as those of claim 12. Therefore, the analysis for claim 12 is applied to claim 24.
Dependent claim 25: The limitation of “to generate one or more output images of a third type of object based, at least in part, on a visual aspect of a type of object in one or more output images” is a further mental process that can be performed by observations, evaluations, judgments, and opinions. Besides the limitation of “one or more neural network,” which is already addressed in the analysis of claim 6, dependent claim 25 does not recite any additional non-abstract elements for purposes of Step 2A Prong Two and Step 2B analysis.
Dependent claim 26: The further limitation of “wherein the style of the one or more second objects includes one or more randomly varied style features of one or more second objects” is a further mental process that can be performed by observations, evaluations, judgments, and opinions. This claim does not recite any additional non-abstract elements for purposes of Step 2A Prong Two and Step 2B analysis.
Therefore, the rejected claims are directed to a judicial exception and do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. Therefore, these claims are not patent-eligible under § 101.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
1. Claims 6-10, 14, 16-18, 21-23, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over by Zhao et al., “Stylized Adversarial AutoEncoder for Image Generation,” MM’17, October 23–27, 2017, Mountain View, CA, USA (“Zhao”), in view of Yin et al., “Towards Large-Pose Face Frontalization in the Wild,” arXiv:1704.06244v3 [cs.CV] 17 Aug 2017 (“Yin”).
As to claim 6, Zhao teaches a computer-implemented method, comprising:
using one or more neural networks to generate one or more output images depicting an object based, at least in part, on: [§ 3, paragraph 1: “Our goal is to generate images from two exemplary images, content image c and style image s, by defining and training a neural network.” The images being generated as described here in Zhao correspond to “one or more output images.” The generated image depicts “an object” such as the object “B” in the example of FIG. 1, the generated faces in FIG. 4, or the generated characters in FIG. 3. These generated images are the output images generated by the “generator” described in § 3.1, which is part of the neural network shown in FIG. 2 and a neural network itself. The neural network is described in abstract: “an autoencoder-based generative adversarial network (GAN) for automatic image generation” The entire GAN, which includes “generator” (§ 3.1) and a “discriminator” (§ 3.2), corresponds to “one or more neural networks.”]
identifying, using a first encoder of the one or more neural networks, [§ 3.1: The generative network consists of two encoders (Encc and Encs)… Encs encodes the style image to the style latent representation or feature zs” The style encoder Encs corresponds to a “target encoder network” and is part of the overall neural network as shown in FIGS. 1-2. The style encoder itself is a neural network, as described in e.g., § 3.3, paragraph 2: “style feature extractor [has] three convolutional layers without down-sampling, which preserve the detail information of the exemplary images as much as possible.”] a […] of a first object, depicted within a first image; [§ 3, paragraph 1: “Our goal is to generate images from two exemplary images, content image c and style image s.” Here, the content image corresponds to a first image. In the face generation example, see Zhao, § 4.2, paragraphs 1-2: “We evaluate our model for face image generation on Labeled Faces in the Wild(LFW)[15] dataset. …For each image in the training set, we blurred the image with Gaussian filter of kernel size 21x21, which only preserves background color and lineament, and used it as the content image… Then, we generated images by the modified attribute vector while keeping the content image fixed. As we can see in Figure 7, generated samples are visually consistent with attribute transferring.”]
identifying a pose and a style of one or more second objects depicted within one or more second images, [§ 3, paragraph 1: “Our goal is to generate images from two exemplary images, content image c and style image s, by defining and training a neural network.” Here, the “style image” in Zhao corresponds to the limitation of “one or more second images.” In the face generation example, Zhao, § 4.2, paragraph 1 teaches:
We evaluate our model for face image generation on Labeled Faces in the Wild (LFW)[15] dataset… For each image in the training set, we blurred the image…which only preserves background color and lineament, and used it as the content image. We applied the pre-trained model provided by [20] as the Encs to extract the 73-dimensional attribute score vector as the style feature vector, which describes different aspects of facial appearance such as age, gender, or facial expression, following previous method [36]. The SAAE model was trained to generate the clear face image given the blurred content image and attribute vector extracted from the style image.
That is, the style images depict faces, and are processed to extract a style feature vector (i.e., a “style”), also known as an attribute vector, that describes “different aspects of facial appearance such as age, gender, or facial expression” (as quoted above). Furthermore, this concept of style also includes a “pose” because, as shown in FIG. 4, the facial expressions of frown, smile, open eye, and narrow eye are also facial poses or include face poses, since the arrangement and positioning of facial features constitute a pose of facial features (also called a face pose), and the instant claim does not require a more precise type of pose. Therefore, Zhao teaches a pose (in the facial expression) and a style (the style feature vector) of one or more second objects (a face) depicted within the style images. Furthermore, § 4.2, paragraph 2 teaches:
Followed the evaluation procedure in [36], we generate various images with interpolated attributes by gradually increasing the values along each attribute dimension. To be more specific, we modify the value of one attribute dimension by interpolating between the minimum and maximum value. Then, we generated images by the modified attribute vector while keeping the content image fixed. As we can see in Figure 7, generated samples are visually consistent with attribute transferring. For example, by changing attributes like “eyewear”, the global appearance is well preserved but the difference appears in the eye region.
Note that “Figure 7” in the quote above is a typographical error for Figure 4, as clear from the context. See also FIG. 4, caption: “Figure 4: Attribute-conditioned image generation organized into six groups (gender, age, complexion, expression, eye wear and eye size).” That is, the style (attributes) extracted from the style image is transferred to a new image that is based on the content image, which is fixed. Generated images, as shown in FIG. 7, depict an object in the form of a face. This image also satisfies the limitation of “based, at least in part, on a pose of the first object,” since the image uses the pose to be depicted in the image. Note that the pose (in the facial expression) determined for the generated image constitutes a pose of the first object that is being depicted. In the example described in § 4.2 of Zhao, this pose (in the facial expression) is based on those of the style images, but is also a modified value obtained by interpolation.] wherein the style of the one or more second objects is identified using a second encoder of the one or more neural networks; [§ 4.2, paragraph 1: “…Encs to extract the 73-dimensional attribute score vector as the style feature vector, which describes different aspects of facial appearance such as age, gender, or facial expression. The SAAE model was trained to generate the clear face image given the blurred content image and attribute vector extracted from the style image.” That is, the style is extracted (identified) using the model (Encs). This Encs is further described in § 3.1: The generative network consists of two encoders (Encc and Encs)… Encs encodes the style image to the style latent representation or feature zs” The style encoder Encs corresponds to a “second encoder” which is part of the overall SAAE neural network as show in FIGS. 1-2.”] and
determining features of the object based on combining features associated with the […] of the first object, the identified pose of the one or more second objects, and the identified style of the one or more second objects. [As shown in FIG. 1, the generated image is a combination of the content features and the style features, as encoded by the encoders discussed above. The encoded style features include the pose and style of the second objects. Furthermore, as described above, FIG. 4 shows, in the case of face generation, the style and pose of the generated images are controlled by the style features. That is, the features of the object are determined at the convergence point in the architecture shown in FIG. 2, in which the two branches of the content image and the style image are converged.]
Zhao does not explicitly teach the limitation of a “pose” of the first object being identified. Instead, Zhao teaches identifying features of the image such as the lineament, which is not considered to be an explicit teaching of the pose.
Yin teaches identifying a “pose” of an object [§ 3.1, last paragraph: “we optimize a projection matrix m ∈ R 2×4 based on pitch, yaw, roll, scale and 2D translations to represent the pose of an input face image. Let p = {m, αid, αexp, αtex} denotes the 3DMM coefficients. The target of the reconstruction module R is to estimate p = R(x), given an input image x.” Note that the pose information is encoded in an encoder-decoder network, as described in § 3.2, paragraph 2: “In Figure 2, features from the two inputs to the generator G are fused through an encoder-decoder network to synthesize a frontal face x f = G(x, p).”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zhao with the teachings of Qiao by implementing the method of Zhao such that the content features being identified include a pose of the first object. The motivation for doing so would have been to incorporate pose information of an input image in a manner that facilitates the pose, particularly the angle of a face, to be changed in a synthesized image, as suggested by Qiao (see Figure 1 caption: “Given a nonfrontal face image as input, the generator produces a high quality frontal face.”).
As to claim 7, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, further comprising:
inferring, using a target encoder network, [Zhao, § 3.1: The generative network consists of two encoders (Encc and Encs)… Encs encodes the style image to the style latent representation or feature zs” The style encoder Encs corresponds to a “target encoder network” and is part of the overall neural network as shown in Zhao, FIGS. 1-2. The style encoder itself is a neural network, as described in e.g., Zhao, § 3.3, paragraph 2: “style feature extractor [has] three convolutional layers without down-sampling, which preserve the detail information of the exemplary images as much as possible.”] style data of the one or more second objects, the style data of the one or more second objects including style codes corresponding to respective points in a style space, the style space corresponding to a distribution of objects in a class of objects. [Zhao, § 3.1: “Encs encodes the style image to the style latent representation or feature zs… zs = Encs(s).” That is, zs corresponds to a “style data” and its content (which is encoded, as described above) corresponds to “style codes.” The “style space” is implicitly disclosed as the space in which the style codes exist. Since style codes are generated from various style images, there exists a space corresponding to the distribution of various objects represented by style images. Note that the term “class of objects” is not defined to require any specific object type or characteristic. Therefore, the set of content represented by the style images reads on the limitation of “a class of objects.” Similarly, the term “distribution” is defined to require any specific type of distribution. Therefore, where there is a plurality of objects, there is some “distribution” that reads on the instant claim limitation.]
As to claim 8, the combination of Zhao and Yin teaches the computer-implemented method of claim 7, wherein the features of the object are provided as input to the one or more neural networks. [Zhao, FIGS. 1 and 2 shows that generated image is generated based on the features input in the preceding layers, including the content code, style code, as characterized in § 3.1. See also Zhao, § 4.2: “We evaluate our model for face image generation on Labeled Faces in the Wild(LFW)[15] dataset… we blurred the image…which only preserves background color and lineament, and used it as the content image… The SAAE model was trained to generate the clear face image given the blurred content image and attribute vector extracted from the style image.” Note that Zhao, FIG. 4 depicts a pose such as a frown or smile, or the general pose of the face.]
As to claim 9, the combination of Zhao and Yin teaches the computer-implemented method of claim 8, further comprising:
inferring, using a second encoder network, [Zhao, § 3.1: The generative network consists of two encoders (Encc and Encs)… Encs encodes the style image to the style latent representation or feature zs” The style encoder Encs is part of the overall neural network as shown in FIGS. 1-2. The style encoder itself is a neural network, as described in e.g., Zhao, § 3.3, paragraph 2: “style feature extractor [has] three convolutional layers without down-sampling, which preserve the detail information of the exemplary images as much as possible.”] a style code of the one or more second objects, the style code representing an appearance style of the one or more second objects; [Zhao, § 3.1: “Encs encodes the style image to the style latent representation or feature zs… zs = Encs(s).” That is, zs corresponds to a “style data” and its content (which is encoded, as described above) corresponds to “style codes,” which represents the appearance style of an object in the style image.] and
performing regularization by re-constructing an image using the content code and the style code [Zhao, § 3.1: “The reconstruction error is denoted by L2-loss:
L
r
e
c
=
|
x
-
x
^
|
where x is the ground truth image. Note that according to equation, (3)
x
^
=
D
e
c
(
z
c
,
z
s
). That is, the generated image is reconstructed from the content code and style codes of zc and zs. Note that the L2-loss, which is used in conjunction with the discriminator loss, implements regularization during training, as discussed in Zhao, § 3.4 (“Training strategy”). If the applicant does not consider Zhao to teach the act “re-constructing” (as opposed to merely constructing), then, alternatively, Whitney discussed below is deemed to teach “re-constructing.”]
As to claim 10, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, wherein the one or more neural networks have not processed previously-received images including the one or more second objects represented as having a visual aspect. [Zhao, § 3.4 (“Training Strategy”): “we propose a 3-step training strategy to optimize our model.” This limitation is taught by Zhao because during the initial part training process, the two encoders perform the feature extraction process, and the decoder (Dec) generates the output image without having previously processed any images.]
As to claim 14, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, wherein the one or more neural networks is a generative adversarial network (GAN) including a conditional image generator and an adversarial discriminator. [Zhao, abstract: “we propose an autoencoder-based generative adversarial network (GAN) for automatic image generation.” The network includes a generator (§ 3.1: “The generator”), which is “conditional” because it generates an image as a condition of the input attributes from the encoder networks (see, e.g., Zhao, § 4.2: “attribute-conditioned face generation)), and a discriminator (§ 3.2: “The discriminator”), which is “adversarial” because it is used in “adversarial training” (§ 3.4, paragraph 4).]
As to claims 16-17, these claims are directed to a system for performing operations that are the same or substantially the same as those recited in claims 6-7, respectively. Therefore, the rejections made to claims 6-7 are applied to claim 16-17, respectively.
Furthermore, Zhao teaches “a system, comprising: at least one processor having one or more circuits to…” [Since Zhao’s method involves neural network computations, Zhao implicitly discloses that its method is performed on a computing device that includes at least a processor with circuits, as these are generic computer components. That is, one of ordinary skill in the art would understand the primary reference as teaching the instant limitations of a processor comprising circuits. “[I]n considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom.” MPEP § 2144.01.].
Claim 17 differs from claim 7 in that it recites “one or more other objects” rather than “the one or more second objects.” However, the teachings of the art applied to the limitation of “the one or more second objects” also teaches the limitation of “one or more other objects.”
As to claim 18, the combination of Zhao and Yin teaches the system of claim 17, wherein the features of the object are provided as input to the one or more neural networks. [Zhao, FIGS. 1 and 2 shows that generated image is generated based on the features input in the preceding layers, including the content code, style code, as characterized in § 3.1. See also Zhao, § 4.2: “We evaluate our model for face image generation on Labeled Faces in the Wild(LFW)[15] dataset… we blurred the image…which only preserves background color and lineament, and used it as the content image… The SAAE model was trained to generate the clear face image given the blurred content image and attribute vector extracted from the style image.” Note that Zhao, FIG. 4 depicts a pose such as a frown or smile, or the general pose of the face.]
As to claims 21-22, these claims are directed to a system for performing operations that are the same or substantially the same as those recited in claims 6-7, respectively. Therefore, the rejections made to claims 6-7 are applied to claim 21-22, respectively.
Furthermore, Zhao teaches “one or more processors, comprising circuitry to…” [Since Zhao’s method involves neural network computations, Zhao implicitly discloses that its method is performed on a computing device that includes at least a processor with circuits, as these are generic computer components. That is, one of ordinary skill in the art would understand the primary reference as teaching the instant limitations of a processor comprising circuits. “[I]n considering the disclosure of a reference, it is proper to take into account not only specific teachings of the reference but also the inferences which one skilled in the art would reasonably be expected to draw therefrom.” MPEP § 2144.01.].
Claim 22 differs from claim 7 in that it recites “one or more other objects” rather than “the one or more second objects.” However, the teachings of the art applied to the limitation of “the one or more second objects” also teaches the limitation of “one or more other objects.”
As to claim 23, the combination of Zhao and Yin teaches the one or more processors of claim 21, wherein the features of the object are provided as input to the one or more neural networks. [Zhao, FIGS. 1 and 2 shows that generated image is generated based on the features input in the preceding layers, including the content code, style code, as characterized in § 3.1. See also Zhao, § 4.2: “We evaluate our model for face image generation on Labeled Faces in the Wild(LFW)[15] dataset… we blurred the image…which only preserves background color and lineament, and used it as the content image… The SAAE model was trained to generate the clear face image given the blurred content image and attribute vector extracted from the style image.” Note that Zhao, FIG. 4 depicts a pose such as a frown or smile, or the general pose of the face.]
As to claim 25, the combination of Zhao and Yin teaches the one or more processors of claim 21, wherein the one or more neural networks are to generate one or more output images of a third type of object based, at least in part, on a visual aspect of a type of object in one or more output images. [In general, these limitations are similar to those of claim 21 and are taught by the teachings of the reference cited in the rejection of claim 21. Furthermore, § 3.1 teaches that the content code zc and the style code zs are provided to the decoder Dec, as given in equation (3) in § 3.1:
x
^
=
D
e
c
(
z
c
,
z
s
)
. Referring to FIG. 2 of Zhao, for example, the generated image (e.g., “W” with a blue background) can be considered to be an image of a second type of object based on the visual aspect (i.e., content code and/or style code) of a type of object in the content image and/or style image. As the claim does not precisely define the recited types, an object that is portrayed in a style or content image can be regarded as a first type of object, while an object portrayed in the generated image can be regarded as a second type of object.]
2. Claims 11-12, 19-20, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Yin, and further in view of Doumoulin et al., “A Learned Representation for Artistic Style,” arXiv:1610.07629v5 [cs.CV] 9 Feb 2017 (“Doumoulin”).
As to claim 11, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, but does not teach the method further comprising “representing the style data for the other objects as affine transformation parameters in normalization layers of the neural network.”
Dumoulin teaches the above limitation. In general, Dumoulin teaches a “learned representation for artistic style” (see title), involving “the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings” (abstract)” (§ 2.1). Therefore, Dumoulin is in field of machine learning and is also pertinent to problems in style transfer.
In particular, Dumoulin teaches “representing style data of the one or more second objects as affine transformation parameters in normalization layers of the one or more neural networks.” [§ 2.1, paragraph 3: “conditional instance normalization. The goal of the procedure is transform a layer’s activations x into a normalized activation z specific to painting style s. Conditioning on a style is achieved as follows:
z
=
γ
s
x
-
μ
σ
+
β
s
where μ and σ are x’s mean and standard deviation taken across spatial axes and γs and βs are obtained by selecting the row corresponding to s in the γ and β matrices.” Note that γs and βs are “affine transformation parameters,” corresponding to the transformations of scaling and shifting. See, e.g., Fig. 3 caption (“The input activation x is normalized across both spatial dimensions and subsequently scaled and shifted using style-dependent parameter vectors γs and βs”) and § 2.1, paragraph 2 (“it is sufficient to specialize scaling and shifting parameters after normalization to each specific style. In other words...it is sufficient to tune parameters for an affine transformation after normalization for each style.”). These parameters are learned as described in § 3.5, paragraph 1: “learning a different set of γ and β parameters for every style, we are in some sense learning an embedding of styles.” With respect to the limitation of a plurality of normalization layers, Table 1 on page 12 teaches “Conditional instance normalization after every convolution,” wherein there are a plurality of convolution layers as shown in this table.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Dumoulin by modifying the method of Zhao, as modified thus far, to include the further operation of representing style data of the one or more second objects as affine transformation parameters in normalization layers of the neural network. The motivation for doing so would have been to implement an instance normalization technique that enables the convolutional weights of a style transfer network can be shared across many styles, as suggested by Dumoulin, § 2.1, paragraph 2 (“all convolutional weights of a style transfer network can be shared across many styles, and it is sufficient to tune parameters for an affine transformation after normalization for each style”).
As to claim 12, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, but does not teach the method further comprising: “generating, from style data and using multilayer perceptrons, parameters to be used in adaptive instance normalization layers of the neural network.”
Dumoulin, in an analogous art, teaches the above limitation. In general, Dumoulin teaches “the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings” (abstract)” (§ 2.1). Dumoulin is in the same field of endeavor as the claimed invention, namely machine learning.
In particular, Dumoulin teaches “generating, from style data and using multilayer perceptrons, parameters to be used in adaptive instance normalization layers of the one or more neural networks.” [§ 2.1, paragraph 3: “conditional instance normalization. The goal of the procedure is transform a layer’s activations x into a normalized activation z specific to painting style s. Conditioning on a style is achieved as follows:
z
=
γ
s
x
-
μ
σ
+
β
s
where μ and σ are x’s mean and standard deviation taken across spatial axes and γs and βs are obtained by selecting the row corresponding to s in the γ and β matrices.” Note that γs and βs are “parameters to be used in adaptive instance normalization.” With respect to the limitation of “generating, from the style data and using multilayer perceptions,” § 3.4 teaches that these parameters are learned: “one way to incorporate a new style to a trained network is to keep the trained weights fixed and learn a new set of γ and β parameters.” The “trained network” described here refers to the “style transfer network” (§ 3.3) which receives a “style image” (style data), as described in FIG. 2 and its caption. The style transfer network is a “multilayer perceptron” as described in § 2, second-to-last paragraph (“feed-forward convolutional network”) and shown in Table 1 on page 12. A feed-forward convolutional network as described here is regarded as a type of multilayer perceptron. With respect to the limitation of “adaptive instance normalization”, this term is interpreted to read on an instance normalization method that has an adaptation aspect. Here, the method disclosed in Dumoulin is considered to be “adaptive” because there is a “single conditional style transfer network…given both a content image and the identity of the style to apply and produces a pastiche corresponding to that style” (§ 2.1, paragraph 2). That is, the network adapts to a given style. The Examiner notes that while the specification of the instant application uses the term “Adaptive Instance Normalization (AdaIN)”, the language of the claim is does not refer to a method in uppercase and is not accompanied by the clarifying parenthetical “AdaIN”. Therefore, the instant claim term does not have the meaning associated with the term “AdaIN.” Should applicant wish to rely on the features of “AdaIN,” applicant may amend the claims to more precisely define the claimed instance normalization. With respect to the limitation of a plurality of normalization layers, Table 1 on page 12 teaches “Conditional instance normalization after every convolution,” wherein there are a plurality of convolution layers as shown in this table.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Dumoulin by modifying the method of Zhao, as modified thus far, to include the further operation of “generating, from style data and using multilayer perceptrons, parameters to be used in adaptive instance normalization layers of the neural network.” The motivation for doing so would have been to implement an instance normalization technique that enables the convolutional weights of a style transfer network can be shared across many styles, as suggested by Dumoulin, § 2.1, paragraph 2 (“all convolutional weights of a style transfer network can be shared across many styles, and it is sufficient to tune parameters for an affine transformation after normalization for each style”).
As to claims 19-20, the further limitations recited these claims are the same or substantially the same as those recited in claims 11-12, respectively. Therefore, the rejections made to claims 11-12 are applied to claim 19-20, respectively.
As to claim 24, the further limitations recited in this claim are the same or substantially the same as those recited in claims 12. Therefore, the rejections made to claims 12 are applied to claim 24.
3. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Yin, and further in view of Makhzami et al., “PixelGAN Autoencoders,” arXiv:1706.00531v1 [cs.LG] 2 Jun 2017 (“Makhzami”).
As to claim 13, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, further comprising: selecting one or more second objects from a class of objects [Zhao, § 4.1, paragraph 3: “For each fixed c, we choose different style images s, and SAAE will generate a variety of images with the same content label while differing in styles.” The generated image has the content (visual aspect) of the content image, as conceptually illustrated in FIG. 2, and represents objects by having the stylistic characteristics of the respective style images. Examples of objects include persons of varying “age, gender, or facial expression” (§ 4.2, paragraph 1).], but does not teach the further limitation of “using random sampling of a multi-variate Gaussian distribution.”
Makhzami teaches the above limtiations. Makhzami teaches “a generative autoencoder… disentangle the style and content information of images in an unsupervised fashion” (see abstract). Therefore, Makhzami is in the field of machine learning and is also pertinent to problems in neural style transfer.
In particular, Makhzami teaches selecting objects “using random sampling of a multi-variate Gaussian distribution” [§ 1, paragraph 1: “The generative model, G, samples the prior p(z) and generates the sample G(z).” Here, z is a “latent code” (see paragraph above equation 3 on page 3) generated from the input image x (as shown in FIG. 1). The prior p(z) has a multivariate Gaussian distribution, as disclosed in § 1, last paragraph: “by imposing a Gaussian distribution on the latent code, we can achieve a global vs. local decomposition of information.” The distribution is “multi-variate” because z is a multivariate vector, as stated in § 2.2, paragraph 4: “Suppose z = [z1, z2, z3] is the hidden code which in this case is the output probabilities of the softmax layer of the inference network.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings Makhzami by performing the selection of the other objects from a class of objects by “using random sampling of a multi-variate Gaussian distribution.” The motivation for doing so would have been to attain decomposition of information, as suggested by Makhzami, § 1, last paragraph (“by imposing a Gaussian distribution on the latent code, we can achieve a global vs. local decomposition of information”).
4. Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Yin, and further in view of Ioffe et al., “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” arXiv:1502.03167v3 [cs.LG] 2 Mar 2015 (“Ioffe”).
As to claim 15, the combination of Zhao and Yin teaches the computer-implemented method of claim 14, but does not teach the method further comprising the further limitations recited in the instant claim.
Ioffe teaches the further limitations. Ioffe teaches the technique of batch normalization for deep neural networks (see title and abstract). Note that Ioffe’s batch normalization technique is the basis for instance normalization described in the instant application’s specification.
In particular, Ioffe teaches: normalizing, by a normalization layer of the adversarial discriminator, layer activations to zero mean and unit variance distribution; [Page 3, algorithm 1, the “normalize” operation denoted by
x
^
i
←
x
i
-
μ
B
σ
B
2
+
ϵ
where “xi” are the inputs to the normalization layer and the “layer activations.” Note that as stated on page 4, left column top paragraph,
ϵ
can be neglected, so as to result in a mean of 0 and a variance of 1 (i.e., a “unit variance): “The distributions of values of any
x
^
has the expected value of 0 and the variance of 1, as long as the elements of each mini-batch are sampled from the same distribution, and if we neglect
ϵ
.” With respect to the limitation of “by a normalization layer of the adversarial discriminator,” note that the normalization layer may be applied to any layer of a neural network, as taught in § 2: “As each layer observes the inputs produced by the layers below, it would be advantageous to achieve the same whitening of the inputs of each layer.” Since the “adversarial discriminator” is taught by Zhao and is made of neural network layers, Ioffe’s teachings applies to the “adversarial discriminator.] and de-normalizing the normalized layer activations using an affine transformation. [Page 3, algorithm 1, “scale and shift” operation denoted by the parameters γ and β, which are affine parameters for scaling and shifting (see page 4, right column: “scaling by γ and shift by β”). See also § 3, last paragraph: “Furthermore, the learned affine transform applied to these normalized activations allows the BN transform to represent the identity transformation and preserves the network capacity.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Ioffe by performing the further operations of “normalizing, by a normalization layer of the adversarial discriminator, layer activations to zero mean and unit variance distribution; and de-normalizing the normalized layer activations using an affine transformation.” The motivation for doing so would have been to address the problem of internal covariate shift, to the result of attaining higher learning rates or allowing initialization to be less carefully performed, as suggested by Ioffe, abstract (“We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs…Batch Normalization allows us to use much higher learning rates and be less careful about initialization.”).
5. Claim 26 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao in view of Yin, and further in view of Zhu et al., “Toward Multimodal Image-to-Image Translation,” arXiv:1711.11586v2 [cs.CV] 1 Feb 2018 (“Zhu”).
As to claim 26, the combination of Zhao and Yin teaches the computer-implemented method of claim 6, but does not teach the further limitations of the instant dependent claim.
Zhu, which generally pertains to image generation using GANs (see § 1, last paragraph), teaches “wherein the style of the one or more second objects includes one or more randomly varied style features of one or more second objects.” [§ 1, paragraph 2: “At inference time, a deterministic generator uses the input image, along with stochastically sampled latent codes, to produce randomly sampled outputs.” See also FIG. 2 caption: “To produce a sample output, a latent code z is first randomly sampled from a known distribution (e.g., a standard normal distribution). A generator G maps an input image A (blue) and the latent sample z to produce a output sample Bˆ (yellow).” See also § 3, paragraph 2: “For example, a sketch of a shoe could map to a variety of colors and textures, which could get compressed in this latent code.” That is, the latent code corresponds to the “style features of one or more second objects,” since it is a set of features used for image generation, and the random sampling of it constitutes the use of a randomly varied style feature.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Zhu by implementing the style of the one or more second objects to include “one or more randomly varied style features of one or more second objects” as taught by Zhu. Doing so would have enabled the generation of randomly sampled outputs for multimodal image generation (see Zhu, § 1, paragraph 2: “A common approach to representing multimodality is learning a low-dimensional latent code, which should represent aspects of the possible outputs not contained in the input image. At inference time, a deterministic generator uses the input image, along with stochastically sampled latent codes, to produce randomly sampled outputs.”)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following document depicts related techniques in the art.
Tran et al., “Disentangled Representation Learning GAN for Pose-Invariant Face Recognition,” CVPR 2017 teaches the use of pose codes in image generation.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Y.D.H./Examiner, Art Unit 2124
/VINCENT GONZALES/Primary Examiner, Art Unit 2124