Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 1/28/2026 has been entered.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Theobald (US 20210097730 A1).
Regarding claim 1, Theobald teaches a method comprising:
receiving input comprising an image of a face and a target value of an attribute of the face to be modified (par. 0018: “As inputs, the image generation system 100 receives an input image 102 (which may also be called a reference image), a facial expression description 104, and a pose description 106.”);
encoding the image using an encoder of an image generation neural network to obtain an image embedding (par. 0019: “The input image 102 may be encoded in any suitable format and color space that allow interpretation of the input image 102 by the image generation system 100.”); and
generating a modified image of the face having the target value of the attribute based on the image embedding using a decoder of the image generation neural network (par. 0024: “The output image 108 is generated based on the input image 102 and such that the subject of the input image is depicted in accordance with the facial expression description 104 and the pose description 106.”), wherein the image generation neural network is trained using a plurality of training images depicting a single subject generated by a training image generation neural network (par. 0028: “The training data 221 is a large group of content items and includes images that depict the face of a subject (where the subject is a person). As one example, the training data 221 may include videos from which images can be extracted.”), and wherein the plurality of training images includes a first synthetic image and a second synthetic image, wherein the first synthetic image depicts a subject having a first value of a first training attribute and a second value of a second training attribute, and wherein the second synthetic image depicts the same subject having a third value of the first training attribute and a fourth value of the second training attribute (par. 0028: “As another example, the training data 221 may include image pairs that each show the same person but with differing expressions and/or poses.”).
Regarding claim 2, Theobald teaches the method of claim 1, further comprising:
generating an edit vector that indicates the target value of the attribute, wherein the modified image is generated based on the edit vector (par. 0039: “the reference shape description 228 can use principal component analysis to describe shape variations according to active appearance model or active shape model techniques, in which a statistical model of object shape, such as the face shape model described above, can be used to generate a new image based on parameters that are included in the reference shape description. Accordingly, the reference shape description 228 may be a group (e.g., a vector) of principal component analysis coefficients.”).
Regarding claim 3, Theobald teaches the method of claim 2, wherein:
the edit vector indicates target values for a plurality of attributes of the face (par. 0025: “The behavior of each of the neurons may be established through training, which defines connections between neurons according to a vector of parameters, which are referred to as weights.”).
Regarding claim 4, Theobald teaches the method of claim 1, further comprising:
generating a noise vector, wherein the modified image is generated based on the noise vector (par. 0054: “The image generator training system 440 is configured in the form of a generative adversarial network (GAN) in which a generator generates synthetic images, a discriminator attempts to determine whether the images are real or synthetic, and the result of the determination is used to further train both the generator and the discriminator.” NOTE: It is well-known in the art that generative adversarial networks make use of noise vectors for image generation.).
Regarding claim 5, Theobald teaches the method of claim 1, further comprising:
providing an intermediate image embedding from the encoder as input to an intermediate layer of the decoder (par. 40: “To generate the reference shape description 228, the input encoder 225 may incorporate a trained machine learning model (e.g., a trained neural network) that is configured according to active appearance model or active shape model techniques.”).
Regarding claim 6, Theobald teaches the method of claim 1, wherein:
the modified image preserves a texture of the image that is unrelated to the attribute (par. 0057: “In the image generation process that is performed by the image generator 112, the rendered target shape sample 433 serves as a label map that identifies the locations of the image where particular facial features should appear. As previously described, the rendered target shape sample 433 does not correspond to the appearance of the subject of the input image sample 423. Instead, the training procedure performed in the context of the image generator training system 440 teaches the image generator 112 to preserve the identity of the subject from the input image sample 423.”).
Regarding claim 7, Theobald teaches the method of claim 1, wherein:
the modified image preserves an identity of the face (par. 0057, as above in claim 6 rejection.).
Regarding claim 8, Theobald teaches the method of claim 1, further comprising:
caching the image embedding (par. 0019: “The input image 102 may be encoded in any suitable format and color space that allow interpretation of the input image 102 by the image generation system 100.”);
receiving a subsequent input including an additional target value for an additional attribute to be modified (par. 0059: “The generated image 441 is provided as an input to discriminators that, along with the image generator 112, define the generative adversarial network architecture of the image generator training system 440.”); and
generating a subsequent modified image based on the cached image embedding and the additional target value (par. 0020: “The facial expression description 104 is an input to the image generation system 100 that describes a target facial expression to be shown in the output image 108. The facial expression description 104 is an encoded (e.g., using numerical values) representation of a facial expression. The facial expression description 104 may include a value or combination of values that represent commonly understood facial expressions that correspond to emotional states such as happiness, sadness, surprise, fear, anger, disgust, and contempt.”).
Regarding claim 9, Theobald teaches a method comprising:
identifying a subject and a plurality of face attributes including a first attribute and a second attribute (par. 0015: “As will be explained, the systems and methods described herein first modify a shape description for the subject's face according to a change in facial expression and a change in pose. This results in a target shape description (e.g., parameters for a statistical model of face shape) that can be used to render an image of a target face shape.”);
creating a training set by generating a plurality of training images depicting a single subject using a training image generation neural network (par. 0038: “The parameters output by the face shape model describe shape of the face that was input into the model and can be used by the model to output a face image having the same shape, where the image output by the face shape model is not the face of the person from the first frame 223, but instead is a deviation from a mean face, which was determined from the face shape model during training based on all of the faces processed by the model from a training data set.”), wherein the plurality of training images includes a first synthetic image and a second synthetic image, wherein the first synthetic image depicts the subject having a first value of the first attribute and a second value of the second attribute, and wherein the second synthetic image depicts the same subject having a third value of the first attribute and a fourth value of the second attribute (par. 0028: “As another example, the training data 221 may include image pairs that each show the same person but with differing expressions and/or poses.”); and
training, using the training set, an image generation neural network to modify face images based on a target value of the first attribute and the second attribute (par. 0020: “The latent space representation (also referred to as semantic latent code or latent code) is a string of numbers (e.g., a n-dimensional vector, containing a value for each of the n-dimensions) that, when provided as input to the generator, creates a particular image (e.g., to replicate the input image 106). The encoder 112 is a machine learning model trained to generate such a latent space representation. The encoder 112 may, for example, be a neural network trained to encode the input image 106. Given an input image 106 and a generator 132, the encoder discovers a latent space representation of the input image w, such that when the latent space representation of the input image w is input to the generator 132, the resulting generated image 139 perceptually resembles the target input image 106.”).
Regarding claim 10, Theobald teaches the method of claim 9, further comprising:
generating a latent vector for the training image generation neural network (par. 0054, as above in claim 4 rejection; NOTE: It is well-known in the art that generative adversarial networks make use of latent vectors for image generation.);
generating a first modified latent vector based on the latent vector and the first value of the first attribute, wherein the first synthetic image is generated based on the first modified latent vector (par. 0039: “the reference shape description 228 can use principal component analysis to describe shape variations according to active appearance model or active shape model techniques, in which a statistical model of object shape, such as the face shape model described above, can be used to generate a new image based on parameters that are included in the reference shape description. Accordingly, the reference shape description 228 may be a group (e.g., a vector) of principal component analysis coefficients. The principal analysis coefficients included in the reference shape description 228 are the main coefficients of variation along the axes for the shape of the face from a mean of the faces used to train the active appear model or active shape model.”); and
generating a second modified latent vector based on the latent vector and the third value of the first attribute, wherein the second synthetic image is generated based on the second modified latent vector (par. 0047: “The output of the shape estimation model 230, which is the target shape description 231, is compared with the ground truth shape description 229. As per conventional machine learning techniques, the difference between the output and the ground truth is the loss that is used to modify the shape estimation network.”).
Regarding claim 11, Theobald teaches the method of claim 10, further comprising:
generating a third modified latent vector based on the latent vector and a fifth value of the first attribute (par. 0054, as above in claim 4 rejection; NOTE: It is well-known in the art that generative adversarial networks make use of latent vectors for image generation.)); and
generating a third synthetic image based on the third modified latent vector (par. 0056: “The function of the image generator 112 is to generate a generated image 441, which is a newly generated synthetic image that corresponds to the shape (e.g., including expression and pose) of the rendered target shape sample 433 and has the appearance of the input image sample 423.”).
Regarding claim 12, Theobald teaches the method of claim 10, further comprising:
identifying a modification basis vector corresponding to the first attribute (par. 0025: “The behavior of each of the neurons may be established through training, which defines connections between neurons according to a vector of parameters, which are referred to as weights. The weights are determined through repeated iterations of training, in which the network produces an output using training data, the training data is compared to a known correct result (referred to as a ground truth), and the difference between the output and the ground truth (loss) is used as a basis for modifying the network.”); and
multiplying the modification basis vector by the first value of the first attribute to obtain a latent modification vector, wherein the first modified latent vector is based on the latent modification vector (par. 0054, as above in claim 4 rejection; NOTE: It is well-known in the art that generative adversarial networks make use of latent vectors for image generation.).
Regarding claim 13, Theobald teaches the method of claim 9, wherein:
the first value of the first attribute comprises a positive value and the third value of the first attribute comprises a negative value (par. 0022: “the facial expression may include a relative value that indicates a difference value by which a smiling-type facial expression is to be increased or decreased in the output image 108 relative to the input image 102.”).
Regarding claim 14, Theobald teaches the method of claim 9, wherein:
the plurality of training images includes additional synthetic images generated based on a plurality of additional attributes (par. 0039: “the reference shape description 228 can use principal component analysis to describe shape variations according to active appearance model or active shape model techniques, in which a statistical model of object shape, such as the face shape model described above, can be used to generate a new image based on parameters that are included in the reference shape description.”).
Regarding claim 15, Theobald teaches the method of claim 9, further comprising:
training the training image generation neural network based on a global discriminator and a region-specific discriminator (par. 0054, as above in claim 4 rejection; NOTE: It is well-known in the art that generative adversarial networks make use of global and regional discriminators for image generation.).
Regarding claim 16, Theobald teaches the method of claim 9, further comprising:
generating a modified image based on the first value of the first attribute using the image generation neural network (par. 0025: “The image generation system 100 is a machine learning-based system that is configured to generate the output image 108 using the input image 102, the facial expression description 104, and the pose description 106.”); and
comparing the modified image to the first synthetic image, wherein the image generation neural network is trained based on the comparison (par. 0025: “The weights are determined through repeated iterations of training, in which the network produces an output using training data, the training data is compared to a known correct result (referred to as a ground truth), and the difference between the output and the ground truth (loss) is used as a basis for modifying the network.”).
Regarding claim 17, Theobald teaches an apparatus comprising:
at least one processor (par. 0005: “Another aspect of the disclosure is a system that includes a memory and a processor. The memory includes program instructions. The processor is operable to execute the program instructions.”);
at least one memory including instructions executable by the at least one processor (par. 0005, as above); and
an image generation neural network configured to perform the method of claim 1 (par. 0054, as above in claim 4 rejection)
Claim 18 is substantially similar to claim 5, and differs only in that it depends from claim 17 rather than claim 1. As such, claim 18 is rejected on similar grounds to claim 5.
Regarding claim 19, Theobald teaches the apparatus of claim 17, further comprising:
a training image generation neural network configured to generate the plurality of training images (par. 0040: “To generate the reference shape description 228, the input encoder 225 may incorporate a trained machine learning model (e.g., a trained neural network) that is configured according to active appearance model or active shape model techniques.”).
Regarding claim 20, Theobald teaches the apparatus of claim 19, wherein:
the training image generation neural network comprises a global discriminator and a region- specific discriminator (par. 0054, as above in claim 4 rejection; NOTE: It is well-known in the art that generative adversarial networks make use of global and regional discriminators for image generation.).
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Bermano, Amit H., et al. "State‐of‐the‐Art in the Architecture, Methods and Applications of StyleGAN." Computer Graphics Forum. Vol. 41. No. 2. 2022. Used as a reference to teach aspects of generative adversarial networks not covered by Theobald
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN A BARHAM whose telephone number is (571)272-4338. The examiner can normally be reached Mon-Fri, 8:30am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu, can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/RYAN ALLEN BARHAM/ Examiner, Art Unit 2613
/XIAO M WU/ Supervisory Patent Examiner, Art Unit 2613