DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
2. A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/13/2026 has been entered.
Information Disclosure Statement
3. The information disclosure statement (IDS) submitted on 01/13/2026. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Response to Amendment
4. Acknowledgement is made of amendment filed on January 13, 2026, in which claims 1, 14 and 18 are amended, claim 2 is canceled, claim 21 is new, and claims 1, 3-21 are still pending.
Response to Arguments
5. Applicant's arguments, filed on January 13, 2026, with respect to Claims 1, 3-21 have been fully considered but they are not persuasive.
6. With regards to arguments for independent claims 1, 14 and 18, applicants argue that Cole et al. (US 2019/0095698 A1), Phan (US 2022/0398796 A1) and Bhat et al. (US 9,786,084 B1) fail to disclose receiving a request to automatically generate a first identity of a first virtual face without receiving an input image of a face; The examiner respectfully agrees and moots in view of the new ground of rejection regarding claims 1, 14 and 18, since in Chandran et al. (US 2021/0279956 A1) teaches (“the application 146 could use the decoder 156 to generate novel faces by sampling from identities represented by meshes in the data set that is used to train the face model 150, which are also referred to herein as “known identities,” or adding random noise to an identity code associated with a known identity. As another example, the application 146 could receive a new identity that is not one of the known identities and use the face model 150 to generate a face having the new identity and a target expression. As another example, the application 146 could perform blendweight retargeting in which the face model 150 is used to transfer facial expression(s) from an image or video to a new facial identity by determining blendweights associated with the facial expression(s) in the image or video, inputting the blendweights into the expression encoder 154, and inputting a representation of the new facial identity into the identity encoder 152. As a further example, the application 146 could perform 2D landmark-based capture and retargeting by determining 2D facial landmarks from a facial performance in a video, mapping the facial landmarks to expression codes that are then input, along with an identity code associated with a new identity, into the decoder 156 to generate faces having the new identity and the expressions in the facial performance.” [0043] “facial expressions 700, 702, 704, 706, and 708 that are associated with one facial identity are retargeted to the same facial expressions 710, 712, 714, 716, and 718 for a new facial identity in a natural-looking, nonlinear manner. As used herein, “retargeting” refers to transferring the facial expressions associated with one facial identity, which may be represented as blendweights, onto another facial identity. In some embodiments, retargeting is performed by inputting an identity code associated with the new identity and expression codes associated with the facial expressions 700, 702, 704, 706, and 708 into the decoder 156 to generate vertex displacements for deforming a reference mesh into meshes of faces having the new facial identity and the same expressions 710, 712, 714, 716, and 718. As described, the identity code for the new identity may be manually entered by a user, generated by adding random noise to the identity code associated with a known identity, generated by inputting a neutral face mesh associated with the new identity minus the reference mesh into the identity encoder 152, or in any other technically feasible manner. As described, the expression code may also be manually entered by a user, generated by inputting user-specified or automatically-determined blendweights into the expression encoder 154, or in any other technically feasible manner.” [0063]) Chandran teaches an identity code associated with the new identity and expression codes associated with the facial expressions into the decoder to generate vertex displacements for deforming a reference mesh into meshes of faces having the new facial identity, and the expression code may generated by automatically-determined blendweights. Therefore, Chandran teaches the arguments of the limitations for claims 1, 14 and 18 as it is recited.
Claim Rejections - 35 USC § 103
7. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
8. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
9. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
10. Claim(s) 1, 6-13, 14 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cole et al. (US 2019/0095698 A1) in view of Phan (US 2022/0398796 A1) and Chandran et al. (US 2021/0279956 A1).
11. With reference to claim 1, Cole teaches A computer-implemented method comprising: generating, a latent feature representation of the first virtual face based at least in part on the request, (“One example aspect of the present disclosure is directed to a computer-implemented method to obtain facial attribute data of a face. The method includes obtaining, by one or more computing devices, an image of a face.” [0009] “one example system of the present disclosure combines a machine-learned image recognition model with a face modeler that uses a morphable model of a human's facial appearance. The image recognition model can be a deep learning model that generates an embedding in response to receipt of an image (e.g., an uncontrolled image of a face). The example system can further include a small, lightweight, translation model structurally positioned between the image recognition model and the face modeler. The translation model can be a machine-learned model that is trained to receive the embedding generated by the image recognition model and, in response, output a plurality of facial modeling parameter values usable by the face modeler to generate a model of the face (e.g., a three-dimensional model of the face). In some implementations, the generated model of the face can be used to synthesize a controlled image or rendering of the face (e.g., a front-facing, evenly-lit image of the face with a neutral expression). In some implementations, values for one or more facial attributes (e.g., face shape, eye color, hair length, etc.) of the face can be ascertained from the model of the face and/or the rendered controlled image of the face. An artistic rendering of the face, such as, for example, personalized emojis, can be generated from the ascertained facial attribute values.” [0045] “an image recognition model can be a machine-learned model that can receive an image (e.g., that depicts a face) and, in response, produce (e.g., at an output layer or at an intermediate layer) an embedding in a lower dimensional space. This embedding can be useful for various tasks including, primarily, determining a measure of how similar the image shown in the input image is to other images. For example, a face recognition model can be used to determine a measure of how similar a first face shown in a first image is to other faces shown in other images.” [0052] “the present disclosure provides systems and methods that can generate a model of a face (e.g., a three-dimensional model of the face) and/or a rendering of a face (e.g., a controlled rendering) based on an input image of the face (e.g., an uncontrolled image of the face). The model and/or rendering can be used for a number of different uses. As one example, the model and/or rendering can be used to generate a stylized cartoon of the face (e.g., a personalized emoji). As another example, the model can be used to generate realistic three-dimensional renderings of the face for use as an avatar in a virtual reality environment (e.g., to represent a user participating in a multi-user virtual reality or gaming environment).” [0064]) Cole also teaches authoring parameters for an authoring engine based on a latent feature representation of a human face; (“the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306. …. the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306.” [0096-0098], Fig. 1A) Cole further teaches generating, authoring parameters based at least in part on the latent feature representation of the first virtual face; and generating, using the authoring engine, a virtual face model of the at least one virtual face based at least in part on the authoring parameters and the parametric facial model, wherein the virtual face model is a mesh model of the first virtual face. (“an image recognition model can be a machine-learned model that can receive an image (e.g., that depicts a face) and, in response, produce (e.g., at an output layer or at an intermediate layer) an embedding in a lower dimensional space. This embedding can be useful for various tasks including, primarily, determining a measure of how similar the image shown in the input image is to other images. For example, a face recognition model can be used to determine a measure of how similar a first face shown in a first image is to other faces shown in other images.” [0052] “a face modeler can use the plurality of facial modeling parameter values output by the translation model to generate a model of the face (e.g., a three-dimensional mesh model of the face). The face modeler can be any algorithm for creating a model or image of the face from the facial modeling parameter values.” [0057] “The model and/or rendering can be used for a number of different uses. As one example, the model and/or rendering can be used to generate a stylized cartoon of the face (e.g., a personalized emoji). As another example, the model can be used to generate realistic three-dimensional renderings of the face for use as an avatar in a virtual reality environment (e.g., to represent a user participating in a multi-user virtual reality or gaming environment).” [0064] “the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306. …. the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306.” [0096-0098])
PNG
media_image1.png
701
395
media_image1.png
Greyscale
Cole does not explicitly teach receiving a request to automatically generate a first identity of a first virtual face without receiving an input image of a face; accessing an identity engine trained based on a plurality of human faces, each human face being defined based on location information associated with a plurality of facial features, wherein the identity engine trained to generate a latent feature representation of individual human faces, wherein each the latent feature representation is associated with an identity of a virtual human face; using the identity engine, wherein the latent feature representation is associated with the first identity; accessing a decoding engine, the decoding engine trained to reconstruct, via a latent variable space, using the decoding engine, accessing a parametric facial model comprising a plurality of blendshapes; wherein the virtual face model has the first identity. These are Phan teaches. Phan teaches accessing an identity engine trained based on a plurality of human faces, each human face being defined based on location information associated with a plurality of facial features, (“FIG. 2A-2C describe another embodiment of the expression generation system 200 that generates a generative model that is capable of many to many correspondence for generating expressions for persons. With specific reference to FIG. 2A, an identity engine 210 is illustrated. The identity engine 210 case use machine learning techniques to provide a facial recognition system to generate identification representations 212 based on an input face of a person 102. The identity engine 200 can be based on facial recognitions systems, such as FaceNet. … One embodiment of a process for generating identification representations 212 can include a finding the bounding box of the location of faces. Then finding facial features such as length of eyes, length of mouth, the distance between eyes and nose, and so on. The number of facial features chosen may vary, for example, from five to seventy-eight points, depending on annotation. After identifying facial features, the distance between these points is measured. These values are used to classify a face. The faces can be aligned using the facial features. This can be done to align face images displayed from a different angle in a straightforward orientation. Then the features extracted can be matched with a template. The aligned faces can be used for comparison. The aligned face can then be analyzed to generate an embedding of the face using face clustering. The resultant identification encoding of the face, also referred to as an identification representation, can be output for further use be the expression generation system 200. …The expression generation system 200 can be an autoencoder and can be trained in a similar manner as described with respect to FIG. 1B. In addition to the expression information 102 provided to expression generation system 100, the expression generation system 200 is trained using the identity information 212 for each of the persons for which expression information 102 is provided.” [0063-0065] “The motion capture information may, in some embodiments, allow for rapid importation of locations of facial features on a real-life person. For example, the motion capture information may indicate locations of the person's facial features at discrete times. Each discrete time may be defined as a particular expression of the person. Thus, the location of the facial features may be identified for each expression.” [0091]) Phan also teaches each the latent feature representation is associated with an identity of a virtual human face; using the identity engine, wherein the latent feature representation is associated with the first identity; (“The input data 220 provided to expression generation system 200 for person A includes a video of animation of person A. The input data 222 provided to expression generation system 200 for person B includes an image of person B. The corresponding video and image data are provided to an identity engine 200 to generate identity representations for each person. The identity representations are submitted as part of the input data provided to the expression generation system 200. The encoder may be a universal encoder for translating the input images and video into latent feature space representations. A resulting latent feature representation may be generated which is based on distributions of latent variables. The trained decoder can be a universal encoder that can then be used to decode a latent feature space representation in order to output an expression on the person represented in the latent space.” [0068] “FIGS. 6A-6D illustrates embodiments of a mesh generation system 600 for generating and outputting a mesh of a face and head of a virtual character. The meshes can be based on a 2D image of a face of a person in conjunction with outputs generated by the expression generation system and texture map generation system 400.” [0110]) Phan further teaches accessing a decoding engine, the decoding engine trained to reconstruct, via a latent variable space, using the decoding engine, wherein the virtual face model has the first identity. (“The autoencoder may also include a decoder engine 122 to generate reconstructed expression information based on the latent feature representation 120.” [0055] “The input data 220 provided to expression generation system 200 for person A includes a video of animation of person A. The input data 222 provided to expression generation system 200 for person B includes an image of person B. The corresponding video and image data are provided to an identity engine 200 to generate identity representations for each person. The identity representations are submitted as part of the input data provided to the expression generation system 200.” [0068] “FIGS. 6A-6D illustrates embodiments of a mesh generation system 600 for generating and outputting a mesh of a face and head of a virtual character. The meshes can be based on a 2D image of a face of a person in conjunction with outputs generated by the expression generation system and texture map generation system 400.” [0110] “decoding, by the autoencoder, the latent variable space of the one or more expressions);” [0241]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
The combination of Cole and Phan does not explicitly teach receiving a request to automatically generate a first identity of a first virtual face without receiving an input image of a face; accessing a parametric facial model comprising a plurality of blendshapes; This is what Chandran teaches (“the application 146 could use the decoder 156 to generate novel faces by sampling from identities represented by meshes in the data set that is used to train the face model 150, which are also referred to herein as “known identities,” or adding random noise to an identity code associated with a known identity. As another example, the application 146 could receive a new identity that is not one of the known identities and use the face model 150 to generate a face having the new identity and a target expression. As another example, the application 146 could perform blendweight retargeting in which the face model 150 is used to transfer facial expression(s) from an image or video to a new facial identity by determining blendweights associated with the facial expression(s) in the image or video, inputting the blendweights into the expression encoder 154, and inputting a representation of the new facial identity into the identity encoder 152. As a further example, the application 146 could perform 2D landmark-based capture and retargeting by determining 2D facial landmarks from a facial performance in a video, mapping the facial landmarks to expression codes that are then input, along with an identity code associated with a new identity, into the decoder 156 to generate faces having the new identity and the expressions in the facial performance.” [0043] “facial expressions 700, 702, 704, 706, and 708 that are associated with one facial identity are retargeted to the same facial expressions 710, 712, 714, 716, and 718 for a new facial identity in a natural-looking, nonlinear manner. As used herein, “retargeting” refers to transferring the facial expressions associated with one facial identity, which may be represented as blendweights, onto another facial identity. In some embodiments, retargeting is performed by inputting an identity code associated with the new identity and expression codes associated with the facial expressions 700, 702, 704, 706, and 708 into the decoder 156 to generate vertex displacements for deforming a reference mesh into meshes of faces having the new facial identity and the same expressions 710, 712, 714, 716, and 718. As described, the identity code for the new identity may be manually entered by a user, generated by adding random noise to the identity code associated with a known identity, generated by inputting a neutral face mesh associated with the new identity minus the reference mesh into the identity encoder 152, or in any other technically feasible manner. As described, the expression code may also be manually entered by a user, generated by inputting user-specified or automatically-determined blendweights into the expression encoder 154, or in any other technically feasible manner.” [0063] “the application 146 receives a representation of a facial expression. The representation of the facial expression may be in any technically feasible form. For example, the representation of the facial expression could be an expression code. In particular, a user could input a “one-hot” vector that specifies a blendweight of 1 for one blendshape and 0 for other blendshapes, or a vector that specifies blendweights for combining multiple blendshapes. As further examples, the representation of the facial expression could include target blendweights that are specified by a user (e.g., via sliders within a UI) and can be converted to an expression code using the expression encoder 154, target blendweights determined based on a frame of an animation of a face, etc.” [0084]) Chandran teaches an identity code associated with the new identity and expression codes associated with the facial expressions into the decoder to generate vertex displacements for deforming a reference mesh into meshes of faces having the new facial identity, and the expression code may generated by automatically-determined blendweights. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chandran into the combination of Cole and Phan, in order to generate realistic-looking faces.
12. With reference to claim 6, Cole teaches generating the mesh of the virtual face model. (“a face modeler can use the plurality of facial modeling parameter values output by the translation model to generate a model of the face (e.g., a three-dimensional mesh model of the face). The face modeler can be any algorithm for creating a model or image of the face from the facial modeling parameter values.” [0057])
Cole does not explicitly teach at least one facial characteristic associated with the mesh. This is what Phan teaches (“The generation of the shapes and textures of the facial features can be done using a process known as photogrammetry. The photogrammetry data 404 can generate the 3D shapes and textures from the combination of the 2D capture data. The photogrammetry data can be representative of the facial features of the person. Example facial features may include a nose, cheeks, eyes, eyebrows, the forehead, ears, mouth, teeth, and so on. Thus, a facial feature may represent a portion of the real-life person. … each time slice of data of the motion capture data can be used to generate separate photogrammetric models of the positions of the facial features. This motion capture information may be analyzed to identify features to be input into the texture map generation system 400. … The output of the photogrammetric model(s) 404 can be used to generate a mesh of the person 406.” [0089-0092]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
13. With reference to claim 7, Cole does not explicitly teach the at least one facial characteristic comprises at least one of skin texture, eye texture, hair mesh, or hair texture. This is what Phan teaches (“The generation of the shapes and textures of the facial features can be done using a process known as photogrammetry. The photogrammetry data 404 can generate the 3D shapes and textures from the combination of the 2D capture data. The photogrammetry data can be representative of the facial features of the person. Example facial features may include a nose, cheeks, eyes, eyebrows, the forehead, ears, mouth, teeth, and so on. Thus, a facial feature may represent a portion of the real-life person.” [0089-0090]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
14. With reference to claim 8, Cole teaches the authoring parameters specific to the authoring engine. (“an image recognition model can be a machine-learned model that can receive an image (e.g., that depicts a face) and, in response, produce (e.g., at an output layer or at an intermediate layer) an embedding in a lower dimensional space. This embedding can be useful for various tasks including, primarily, determining a measure of how similar the image shown in the input image is to other images. For example, a face recognition model can be used to determine a measure of how similar a first face shown in a first image is to other faces shown in other images.” [0052] “the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306. …. the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306.” [0096-0098])
Cole does not explicitly teach the decoding engine is trained based on the latent space specific to the identity engine. This is what Phan teaches (“The autoencoder may also include a decoder engine 122 to generate reconstructed expression information based on the latent feature representation 120.” [0055] “The expression generation system 200 includes a trained encoder that encodes the expression information to a latent space representation. The input data 220 provided to expression generation system 200 for person A includes a video of animation of person A. The input data 222 provided to expression generation system 200 for person B includes an image of person B. The corresponding video and image data are provided to an identity engine 200 to generate identity representations for each person. The identity representations are submitted as part of the input data provided to the expression generation system 200. The encoder may be a universal encoder for translating the input images and video into latent feature space representations. A resulting latent feature representation may be generated which is based on distributions of latent variables. The trained decoder can be a universal encoder that can then be used to decode a latent feature space representation in order to output an expression on the person represented in the latent space.” [0068]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
15. With reference to claim 9, Cole teaches the latent feature representation is a vector have defined number of values. (“, a face recognition model can be a deep learning model (e.g., a deep convolutional network) that learns a mapping from images that depict faces to a lower dimensional space. Once this space has been produced, tasks such as face recognition, verification, and clustering can be implemented using standard techniques with embeddings as feature vectors. One example of a face recognition model is described in F. Schroff, D. Kalenichenko, and J. Philben. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Computer Society Conf. on CVPR, 2015. The face recognition model described by this publication is provided as one example only. The present disclosure is not limited to the particular details of the particular face recognition model described by this publication. In some implementations in which the face recognition model described by this publication is used, the embedding can be obtained from a pooling layer (e.g., an average pooling layer) that is structurally positioned near the conclusion of a deep convolutional neural network portion of the face recognition model but prior to an L.sub.2 Normalization layer of the face recognition model. For example, the embedding obtained from the pooling layer can be a 1024-dimensional vector.” [0050-0051] “The translation model can be a machine-learned model that is trained to receive the embedding generated by the image recognition model and, in response, output a plurality of facial modeling parameter values.” [0055])
16. With reference to claim 10, Cole does not explicitly teach the vector is representative of an invariant identity of first identity. This is what Phan teaches (“The input data 220 provided to expression generation system 200 for person A includes a video of animation of person A. The input data 222 provided to expression generation system 200 for person B includes an image of person B. The corresponding video and image data are provided to an identity engine 200 to generate identity representations for each person. The identity representations are submitted as part of the input data provided to the expression generation system 200.” [0068] “wherein the identity information comprises an identity vector that is representative of an invariant identity of the respective first or second real-world person.” [0175]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
17. With reference to claim 11, Cole does not explicitly teach the virtual face model is generated based on weights associated with a plurality of blendshapes that the define a shape of the mesh model. This is what Phan teaches. Phan teaches the virtual face model is generated that the define a shape of the mesh model. (“The capture data 402 can be used to generate photogrammetric models 404 of the person. Each image of the frames of video captured from the different cameras may be analyzed to identify features and to generate shapes and textures of the face and head. The data can also be generating using 3D scanning hardware and techniques. The generation of the shapes and textures of the facial features can be done using a process known as photogrammetry. The photogrammetry data 404 can generate the 3D shapes and textures from the combination of the 2D capture data.” [0089] “The generation of the mesh a results in a collection of vertices, edges and faces that defines the shape of the head. Various processes can be used for generation of the polygon mesh based on the requirements of the specific game application and development environment. The system can additionally track the vertices of the mesh for each of the frames of data that is provided by the capture data. A tracked mesh can then be created based on the motion capture data 402 and the model data 404. The generation of the tracked mesh can also provide for the generation of UV coordinates of the mesh.” [0092] “FIGS. 6A-6D illustrates embodiments of a mesh generation system 600 for generating and outputting a mesh of a face and head of a virtual character.” [0110]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
The combination of Cole and Phan does not explicitly teach based on weights associated with the plurality of blendshapes. This is what Chandran teaches (“facial expressions 700, 702, 704, 706, and 708 that are associated with one facial identity are retargeted to the same facial expressions 710, 712, 714, 716, and 718 for a new facial identity in a natural-looking, nonlinear manner. As used herein, “retargeting” refers to transferring the facial expressions associated with one facial identity, which may be represented as blendweights, onto another facial identity. In some embodiments, retargeting is performed by inputting an identity code associated with the new identity and expression codes associated with the facial expressions 700, 702, 704, 706, and 708 into the decoder 156 to generate vertex displacements for deforming a reference mesh into meshes of faces having the new facial identity and the same expressions 710, 712, 714, 716, and 718. As described, the identity code for the new identity may be manually entered by a user, generated by adding random noise to the identity code associated with a known identity, generated by inputting a neutral face mesh associated with the new identity minus the reference mesh into the identity encoder 152, or in any other technically feasible manner. As described, the expression code may also be manually entered by a user, generated by inputting user-specified or automatically-determined blendweights into the expression encoder 154, or in any other technically feasible manner.” [0063] “the application 146 receives a representation of a facial expression. The representation of the facial expression may be in any technically feasible form. For example, the representation of the facial expression could be an expression code. In particular, a user could input a “one-hot” vector that specifies a blendweight of 1 for one blendshape and 0 for other blendshapes, or a vector that specifies blendweights for combining multiple blendshapes. As further examples, the representation of the facial expression could include target blendweights that are specified by a user (e.g., via sliders within a UI) and can be converted to an expression code using the expression encoder 154, target blendweights determined based on a frame of an animation of a face, etc.” [0084]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chandran into the combination of Cole and Phan, in order to generate realistic-looking faces.
18. With reference to claim 12, Cole also teaches the authoring parameters (“the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306. …. the translation model 302 can be a machine-learned model that is trained to receive the embedding 208 generated by the image recognition model 202 and, in response, output a plurality of facial modeling parameter values 306.” [0096-0098], Fig. 1A)
The combination of Cole and Phan does not explicitly teach define the weights associated with the plurality of blendshapes. This is what Chandran teaches (“facial expressions 700, 702, 704, 706, and 708 that are associated with one facial identity are retargeted to the same facial expressions 710, 712, 714, 716, and 718 for a new facial identity in a natural-looking, nonlinear manner. As used herein, “retargeting” refers to transferring the facial expressions associated with one facial identity, which may be represented as blendweights, onto another facial identity. In some embodiments, retargeting is performed by inputting an identity code associated with the new identity and expression codes associated with the facial expressions 700, 702, 704, 706, and 708 into the decoder 156 to generate vertex displacements for deforming a reference mesh into meshes of faces having the new facial identity and the same expressions 710, 712, 714, 716, and 718. As described, the identity code for the new identity may be manually entered by a user, generated by adding random noise to the identity code associated with a known identity, generated by inputting a neutral face mesh associated with the new identity minus the reference mesh into the identity encoder 152, or in any other technically feasible manner. As described, the expression code may also be manually entered by a user, generated by inputting user-specified or automatically-determined blendweights into the expression encoder 154, or in any other technically feasible manner.” [0063] “the application 146 receives a representation of a facial expression. The representation of the facial expression may be in any technically feasible form. For example, the representation of the facial expression could be an expression code. In particular, a user could input a “one-hot” vector that specifies a blendweight of 1 for one blendshape and 0 for other blendshapes, or a vector that specifies blendweights for combining multiple blendshapes. As further examples, the representation of the facial expression could include target blendweights that are specified by a user (e.g., via sliders within a UI) and can be converted to an expression code using the expression encoder 154, target blendweights determined based on a frame of an animation of a face, etc.” [0084]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Chandran into the combination of Cole and Phan, in order to generate realistic-looking faces.
19. With reference to claim 13, Cole teaches a machine learning generated using a deep neural network. (“The image recognition model 202 can receive the image 206 and, in response, supply an embedding 208. In some examples, the image recognition model 202 can be a deep learning model (e.g., a deep convolutional network) that learns a mapping from images to a lower dimensional space.” [0092])
Cole does not explicitly teach the decoding engine. This is what Phan teaches (“The autoencoder may also include a decoder engine 122 to generate reconstructed expression information based on the latent feature representation 120.” [0055]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
20. Claim 14 is similar in scope to the combination of claims 1 and 11, and thus is rejected under similar rationale. Cole additionally teaches Non-transitory computer storage media storing instructions that when executed by a system of one or more computers, cause the one or more computers to perform operations (“Another example aspect of the present disclosure is directed to one or more non-transitory computer-readable media that collectively store a machine-learned face reconstruction model and instructions. The machine-learned face reconstruction model is operable to receive an embedding for a face and, in response to receipt of the embedding, output a reconstructed representation of the face. When executed by one or more computing devices, the instructions cause the one or more computing devices to: ” [0026])
21. Claim 18 is similar in scope to the combination of claims 1 and 11, and thus is rejected under similar rationale. Cole additionally teaches A system comprising one or more computers and non-transitory computer storage media storing instructions that when executed by the one or more computers, cause the one or more computers to perform operations (“The computing system includes at least one processor. The computing system includes a machine-learned translation model that is operable to receive an embedding obtained from a machine-learned image recognition model and, in response to receipt of the embedding, output a plurality of facial modeling parameter values that are descriptive of a plurality of facial attributes of the face. The computing system includes at least one non-transitory computer-readable medium that stores instructions that, when executed by the at least one processor, cause the computing system to:” [0019])
22. Claim(s) 3 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cole et al. (US 2019/0095698 A1), Phan (US 2022/0398796 A1) and Chandran et al. (US 2021/0279956 A1), as applied to claims 1 and 14 above, and further in view of Gottlieb (US 11,341,699 B1).
23. With reference to claim 3, Cole does not explicitly teach the latent feature representation is pseudo-randomly generated based on a latent space associated with the identity engine. This is what Phan teaches. Phan teaches a latent space associated with the identity engine. (“The expression generation system 200 includes a trained encoder that encodes the expression information to a latent space representation. The input data 220 provided to expression generation system 200 for person A includes a video of animation of person A. The input data 222 provided to expression generation system 200 for person B includes an image of person B. The corresponding video and image data are provided to an identity engine 200 to generate identity representations for each person. The identity representations are submitted as part of the input data provided to the expression generation system 200. The encoder may be a universal encoder for translating the input images and video into latent feature space representations. A resulting latent feature representation may be generated which is based on distributions of latent variables. The trained decoder can be a universal encoder that can then be used to decode a latent feature space representation in order to output an expression on the person represented in the latent space.” [0068]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
The combination of Cole, Phan and Chandran does not explicitly teach the latent feature representation is pseudo-randomly generated. This is what Gottlieb teaches (“The system may accordingly generate a synthetic image meeting the at least one image classification requirement directly by selecting a pseudo-randomly generated latent feature vector 105 that complies with the at least one image classification requirement. For example, target latent feature vectors 145 may be stored as default values that comply with certain classification requirements (e.g., by including a respective image feature, such a vehicle body style, make or model, color, etc.). If a user wishes specifies an image classification requirement of a vehicle with a sedan body style, the system may identify each target latent feature vector 145 stored on the system that includes an image classification 150 of the sedan body style. Each identified target latent feature vector 145 may be averaged together to produce a generic latent feature vector that includes an image classification of a sedan body style. The resultant generic latent feature vector may be passed to the trained generator 110 which produces a synthetic image 115 that conforms to the image classification requirement of a vehicle having a sedan body style. The user may add additional classification requirements and under a similar process a new generic latent feature vector may be determined based on the stored target latent feature vectors 145, and a new synthetic image 115 may be generated that includes the classification requirements requested by the user. According to some embodiments, the pseudo-randomly generated “generic” latent feature vectors may be determined according to a logistic regression model as described in more detail with respect to FIG. 6, which can transform a latent feature vector 105 to take on an image classification exhibited by a target latent feature vector 145.” col. 9, lines 33-63) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gottlieb into the combination of Cole, Phan and Chandran, in order to meet desired image characteristics.
24. Claim 15 is similar in scope to claim 3, and thus is rejected under similar rationale.
25. Claim(s) 4, 5, 16, 17, and 19-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Cole et al. (US 2019/0095698 A1), Phan (US 2022/0398796 A1), Chandran et al. (US 2021/0279956 A1) and Gottlieb (US 11,341,699 B1), as applied to claims 1, 3, 14, 15 and 18 above, and further in view of Abel (US 2022/0172431 A1).
26. With reference to claim 4, the combination of Cole, Phan, Chandran and Gottlieb does not explicitly teach the request further comprises requests to generate a plurality of virtual face models and latent feature representation of individual virtual faces is generated for each of the plurality of requested virtual face models. This is what Abel teaches (“the requested attributes for facial features to be included in the simulated face include a gender, and a plurality of sub-attributes desired for the simulated face, and one or more of the sub-attributes is associated with an attribute variation set via the content creation application. … the simulated face is one of a plurality of simulated faces requested to be generated and the method includes applying a variation amount simulated faces setting that defines how similar or dissimilar each one of the simulated faces is with respect to each other. … The method includes accessing a database of images of faces and processing the images through a machine learning process to identify and label features of each of the faces to train a facial rendering model. The method includes accessing the facial rendering model to request data for rendering a plurality of simulated faces. The request includes, attributes for facial features and attribute variations between the plurality of simulated faces. The method includes processing one or more of the plurality of simulated faces. The processing is configured to generate a three-dimensional (3-D) model based for each respective simulated face.” [0014-0016]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Abel into the combination of Cole, Phan, Chandran and Gottlieb, in order to produce very detailed and engaging gaming experiences.
27. With reference to claim 5, Cole does not explicitly teach each of the plurality of virtual face identities is pseudo-randomly generated and each of the virtual face identities is generated from the latent space associated with the identity engine, wherein each latent feature representation is separated from other latent feature representations by a defined threshold value. This is what Phan teaches. Phan teaches each of the plurality of virtual face identities is generated and each of the virtual face identities is generated from the latent space associated with the identity engine, wherein each latent feature representation is separated from other latent feature representations by a defined threshold value (“The expression generation system 200 includes a trained encoder that encodes the expression information to a latent space representation. The input data 220 provided to expression generation system 200 for person A includes a video of animation of person A. The input data 222 provided to expression generation system 200 for person B includes an image of person B. The corresponding video and image data are provided to an identity engine 200 to generate identity representations for each person. The identity representations are submitted as part of the input data provided to the expression generation system 200. The encoder may be a universal encoder for translating the input images and video into latent feature space representations. A resulting latent feature representation may be generated which is based on distributions of latent variables. The trained decoder can be a universal encoder that can then be used to decode a latent feature space representation in order to output an expression on the person represented in the latent space.” [0068] “The system may determine feature encodings for the expression(s) of each frame/image of the source data. The feature encodings may represent locations in the latent feature space (e.g., values for the latent variables).” [0078] “each time slice of data of the motion capture data can be used to generate separate photogrammetric models of the positions of the facial features. This motion capture information may be analyzed to identify features to be input into the texture map generation system 400.” [0091] “The input to the texture map generation system 400 can include one or more 2D images 112 of a person and identity information 212 for the person. The 2D images 212 of the person can be generated by the expression generation system 200. The identity information 212 can be generated by the identity engine 210. The 2D images 112 can be a full set of range of motion data that is generated based on one image of the person using expression generation system 200. The embodiment illustrated of the texture map generation system 400 is an example of a CNN. As described herein, the trained CNN can include an input layer, a plurality of hidden layers (e.g., convolutional layers, pooling layers, fully connected layers, etc.), and an output layer. At the input layer, the texture map generation system 400 can receive a 2D image 112 and the identity encoding 212 for the person. Each 2D image/frame of the range of motion information can be processed separately. The model can generate and output at the output layer one or more texture maps 420 for each input 2D image received. The model can determine the relationship between the 2D input images and the output textures. The model can extract locations of the facial features.” [0098] “the system may select values for these latent variables. This sample may then be provided to the decoder to generate an output expression, for example as a vector associated with the latent feature space. In this way, new expression animations may be generated by the system.” [0125]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Phan into Cole, in order to increase in realism.
The combination of Cole, Phan and Chandran does not explicitly teach each of the plurality is pseudo-randomly generated. This is what Gottlieb teaches (“The system may accordingly generate a synthetic image meeting the at least one image classification requirement directly by selecting a pseudo-randomly generated latent feature vector 105 that complies with the at least one image classification requirement. For example, target latent feature vectors 145 may be stored as default values that comply with certain classification requirements (e.g., by including a respective image feature, such a vehicle body style, make or model, color, etc.). If a user wishes specifies an image classification requirement of a vehicle with a sedan body style, the system may identify each target latent feature vector 145 stored on the system that includes an image classification 150 of the sedan body style. Each identified target latent feature vector 145 may be averaged together to produce a generic latent feature vector that includes an image classification of a sedan body style. The resultant generic latent feature vector may be passed to the trained generator 110 which produces a synthetic image 115 that conforms to the image classification requirement of a vehicle having a sedan body style. The user may add additional classification requirements and under a similar process a new generic latent feature vector may be determined based on the stored target latent feature vectors 145, and a new synthetic image 115 may be generated that includes the classification requirements requested by the user. According to some embodiments, the pseudo-randomly generated “generic” latent feature vectors may be determined according to a logistic regression model as described in more detail with respect to FIG. 6, which can transform a latent feature vector 105 to take on an image classification exhibited by a target latent feature vector 145.” col. 9, lines 33-63) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Gottlieb into the combination of Cole, Phan and Chandran, in order to meet desired image characteristics.
28. Claims 16 and 19 are similar in scope to claim 4, and they are rejected under similar rationale.
29. Claims 17 and 20 are similar in scope to claim 5, and they are rejected under similar rationale.
30. With reference to claim 21, the combination of Cole, Phan, Chandran and Gottlieb does not explicitly teach the request further comprises requests to generate a plurality of virtual face models and latent feature representation of individual virtual faces is generated for each of the plurality of requested virtual face models. This is what Abel teaches (“the requested attributes for facial features to be included in the simulated face include a gender, and a plurality of sub-attributes desired for the simulated face, and one or more of the sub-attributes is associated with an attribute variation set via the content creation application. … the simulated face is one of a plurality of simulated faces requested to be generated and the method includes applying a variation amount simulated faces setting that defines how similar or dissimilar each one of the simulated faces is with respect to each other. … The method includes accessing a database of images of faces and processing the images through a machine learning process to identify and label features of each of the faces to train a facial rendering model. The method includes accessing the facial rendering model to request data for rendering a plurality of simulated faces. The request includes, attributes for facial features and attribute variations between the plurality of simulated faces. The method includes processing one or more of the plurality of simulated faces. The processing is configured to generate a three-dimensional (3-D) model based for each respective simulated face.” [0014-0016] “a three-dimensional (3-D) model of the simulated face is generated. The 3-D model is a facial reconstruction in 3 dimensions using information from the simulated face. The 3-D model, in one embodiment, is a digital file that can be utilized as input to a content creation application 110, which can then utilize a 3-D model to make further adjustments to the 3-D model, refinements, and integration with one or more rigs of characters being developed for a video game.” [0042] “The objective enabled by this feature is to generate many simulated faces that can then be quickly turned into 3-D models in operation 504, for implementation as characters in a game.” [0075]) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Abel into the combination of Cole, Phan, Chandran and Gottlieb, in order to produce very detailed and engaging gaming experiences.
Conclusion
31. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Michelle Chin whose telephone number is (571)270-3697. The examiner can normally be reached on Monday-Friday 8:00 AM-4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http:/Awww.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Kent Chang can be reached on (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is (571)273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https:/Awww.uspto.gov/patents/apply/patent- center for more information about Patent Center and https:/Awww.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHELLE CHIN/
Primary Examiner, Art Unit 2614