DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 18 December 2025 has been entered.
Status of Claims
Applicant's amendments filed on 18 December 2025 have been entered. Claims 1-5, 8, 14-19, and 21 have been amended. No claims have been canceled. No claims have been added. Claims 1-26 are still pending in this application, with claims 1 and 14 being independent.
Response to Arguments
103 Rejections
Applicant's arguments filed 18 December 2025 have been fully considered but they are not persuasive.
Applicant argues, regarding the newly amended portions of independent claim 1, “Li falls short of disclosing or suggested the above identified features of the claims because in Li, the texture is determined by inferring activation components from a base set of expression Blendshape coefficients to get a set of inferred expression coefficients for the uncommon expression. See Li, para. [0069]. An associated key expression activation mask can then be generated, which allows a key expression for different poses to be translated into FACS expressions. See Li, para. [0070]. As explained at paragraph [0044], the activation mask is relative to a neutral image and based on a desired expression. By contrast the claims are directed to "determining a combination of the plurality of base texture maps based on the expression parameters wherein each of the plurality of base texture maps encode texture characteristics for a plurality of expressions." That is, whereas Li describes determining a combination of key expressions, the claims describe a combination of textures, and each of those textures encode characteristics from multiple expressions. In fact, Li merely describes a process for inferring a texture based on a determined set of expressions, and describes those expressions in terms of Blendshapes, related to the face geometry, and activation components related to muscle activations. Thus, Li falls short of describing a combination of base textures.”
Examiner asserts that Li indeed discloses combining textures, for example in cited Paragraph [0066], which recites: “a linear combination of the associated textures, weighted by the expression blendshape coefficients of the fitting, may be used to generate an output FACS texture,” clearly combining textures. Examiner further refers to Paragraph [0069] of Li, reciting: “the associated textures are first inferred using the set of key expressions K that were easily performed by an actual person. Those key expressions are used to generate a set of facial expressions that may be used to generate FACS texture maps. For a given k∈K, its expression blendshape coefficients are denoted by Wk computed by averaging the weights across the associated GAN training data,” again clearly disclosing combining of expressions for texture maps and also discloses, in Paragraph [0070]: “FIG. 6 is a set of activation masks for conversion of a key expression into a three-dimensional mesh and blendshape texture map combination,” clearly combining said texture maps. Thus Examiner maintains that Li reads on the newly amended limitations.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-26 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (US Pub. 2020/0051303), hereinafter Li, in view of Sagar et al. (US Pub. 2021/0390751), hereinafter Sagar.
Regarding claim 1, Li discloses a method for generating an avatar having expressions that mimics expressions of a person (Fig. 1; Paragraph [0009]: FIG. 1 is a structural diagram of a system for generating real-time avatars using dynamic textures; Paragraph [0059]: neutral image is an image from which expressions will be generated. As indicated above, the process of creating the key expressions and translating them to FACS may preferably take place on a computer, like computing device 130, that is better suited to the operation of neural networks and to complex graphical renderings and mathematical calculations), the method comprises: obtaining expression parameters that represent an expression of the person (Fig. 4; Paragraphs [0045]-[0047]: FACS facial textures are created, they require relatively low resources to transmit and store. Therefore, a mobile device, like mobile device 140, may store the FACS facial textures. The FACS facial textures may be used, in conjunction with single image hair modelling 312 to create a real-time photorealistic avatar 314. In particular, the FACS facial textures may be manipulated in a non-processor intensive way by the mobile device 140 to generate realistic avatars in real time. If desired, the expressions to be modelled may be gathered by the same mobile device 140 (e.g. a camera or depth-sensing camera) to create an avatar that mirrors the individual's expressions being made in real-time. Examples of resulting expressions are shown in the real-time photorealistic avatar 314… FIG. 4 is a functional diagram 400 of a conditional generative adversarial network 410 used to generate key expressions. Given a neutral front input image I, an initial mesh is generated MI=(αI, βI, RI), where αI and βI are the respective identity and expression coefficients of the fitting to image I); obtaining a plurality of base texture maps for the person, wherein each of the plurality of base texture maps encode texture characteristics for a plurality of expressions (Fig. 3; Paragraph [0044]: key expressions may be used, for example, by a computing device 130, to generate a set of FACS facial textures 315. These FACS facial textures may be equal in number to the key expressions, but, in testing, 36 FACS facial textures and activation masks were created. The FACS facial textures 315 are generated using activation masks relative to the neutral image 211 and based upon the desired expression mesh. More than 36 may be used, but 36 was found to be adequate to the task. Also, FACS facial textures are only one example of a suitable way to represent the facial textures. Other methods and systems for representing the facial textures and meshes may be used as well; Paragraph [0076]: At step 720, the associated FACS expressions (e.g. FACS textures) are received by the mobile device. A set of thirty-six FACS textures for a desired set of expressions was used in the implementation by the inventors. From these thirty-six FACS textures, almost any synthesized expression can be quickly generated); determining a combination of the plurality of base texture maps based on the expression parameters (Fig. 5; Fig. 6; Paragraphs [0066]-[0069]: the key expressions are used to generate FACS textures for later transmission to a mobile device (or subsequent low computation use by the same system) to generate real-time animated avatars. To perform this translation from key expression to FACS, for each texture map Te having each blendshape e∈ε, where ε is the set of blendshapes in the model, each of which correspond to a FACS action (e.g. an expression), a linear combination of the associated textures, weighted by the expression blendshape coefficients of the fitting, may be used to generate an output FACS texture; Paragraph [0070]: FIG. 6 is a set of activation masks for conversion of a key expression into a three-dimensional mesh and blendshape texture map combination. The associated key expressions for poses 601, 603, and 605 are shown in column 602. The activation masks to translate those into FACS expressions are visible in column 604. Only the portions of the expression that should be activated are highlighted. The darker portion of the activation mask is not included in the FACS expression. Using this process, the FACS expressions are generated based upon the key expressions… the associated textures are first inferred using the set of key expressions K that were easily performed by an actual person. Those key expressions are used to generate a set of facial expressions that may be used to generate FACS texture maps. For a given k∈K, its expression blendshape coefficients are denoted by Wk computed by averaging the weights across the associated GAN training data); generating, in real time, a texture map of a face of the person using the combination of the plurality of base texture maps (Fig. 3; Paragraph [0042]: FIG. 3 is a functional diagram of a process for using generated key expression meshes and images to create blendshape texture maps for animation of an avatar in real-time. These functions may take place partially on a computing device (e.g. computing device 130) and partially on a mobile device (e.g. mobile device 140) due to limitations of the mobile device; Paragraph [0066]: the key expressions are used to generate FACS textures for later transmission to a mobile device (or subsequent low computation use by the same system) to generate real-time animated avatars. To perform this translation from key expression to FACS, for each texture map Te having each blendshape e∈ε, where ε is the set of blendshapes in the model, each of which correspond to a FACS action (e.g. an expression), a linear combination of the associated textures, weighted by the expression blendshape coefficients of the fitting, may be used to generate an output FACS texture. However, the translation from the key expression to the FACS expression is not precisely linear, so direct linear blending will not work. Instead, a UV activation mask is applied for each expression by taking a per-vertex deformation magnitude of each expression from a neutral pose. The result is a non-linear activation mask which acts essentially as a deformation instruction to translate between the key expression and the FACS expression in each vector; Paragraphs [0076]-[0077]: the associated FACS expressions (e.g. FACS textures) are received by the mobile device. A set of thirty-six FACS textures for a desired set of expressions was used in the implementation by the inventors. From these thirty-six FACS textures, almost any synthesized expression can be quickly generated…facial pose and expression data is received by the mobile device at 730. This data may be generated in real-time from a depth sensing camera or even from an optical camera that uses one or more methods (e.g. machine learning) to determine a pose and/or expression for a given user in real time. For example, a user may be using a “selfie camera” mode on a mobile device (e.g. mobile device 140) to capture images of him or herself. That camera may include depth data or may not. But in either case, pose and expression data may be generated as a result, and that pose and expression data may be translated into a FACS set of actions).
While Li teaches rendering the avatar as well as using the texture map (Paragraph [0015]: FIG. 7 is a flowchart of a process for generating real-time facial animation on a computing device based upon a set of three-dimensional meshes and blendshape texture maps; Paragraphs [0085]-[0086]: FIG. 11 is a comparison 1100 of avatars generated using prior art methods and the methods described herein. The input neutral image 1101 for an individual face is shown. Image 1102 is an example of a “toothless smile” expression for a prior art method of generating three-dimensional avatars and associated textures. The corresponding “toothless smile” expression for the methods disclosed herein is shown in image 1104. As can be seen, the images are relatively similar. However, there does appear to be a bit more depth of character in image 1104…the open mouth (e.g. talking) image 1108 of the present method shows a more realistic mouth rendering), Li does not explicitly disclose rendering the avatar using the texture map.
However, Sagar teaches generation of avatars with specific facial expressions (Paragraph [0002]; Paragraph [0106]), further comprising rendering the avatar using the texture map (Paragraphs [0106]-[0108]: Skin textures from each training avatars are passed through a hierarchy of bilateral Gaussian filters, where each layer of the hierarchy is designed to extract a particular type of texture details, such as facial hairs, wrinkles, moles, freckles and skin pores. Once the layers are extracts, each layer can then be independently blended and composited back to form a new texture map. The advantage of this layering approach is that the skin details can be preserved during the blending process…Texture maps represent spatially varying features which can be used in a lighting model to render the final image. A plurality of texture maps may represent spatially varying graphical qualities of the subject which are used by a shading model to render. Examples of texture maps include albedo maps, diffuse maps, shading maps, bump maps or specular maps. In another embodiment, the rendering texture map can be generated from a deep learning model such as a deep appearance model). Sagar teaches that this will allow for enhanced realism or desired custom look of an avatar (Paragraph [0126]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li with the features of above as taught by Sagar so as to allow for enhanced realism or desired custom look of an avatar as presented by Sagar.
Regarding claim 2, Li, in view of Sagar teaches the method according to claim 1 Li discloses wherein the base texture maps are learnt by applying a machine learning process on texture maps corresponding to a frame capturing at least one expression of the plurality of expressions (Fig. 4; Paragraph [0005]: facial textures can be generated by several different methods, some simple and some complex. Typically, the more complex methods result in more accuracy. However, several of the inventors of this patent also invented methods reliant upon a single image to create realistic facial features. Those methods rely upon a trained neural network to predict the features of a face based upon a single input image. The training utilizes detailed textural and depth maps of many human faces to train the neural network as to what is likely to be the characteristics of a face (depth, coloring, texture, etc.) that correspond to individual front-facing images; Paragraphs [0074]-[0077]: FIG. 7 is a flowchart of a process for generating real-time facial animation on a computing device based upon a set of three-dimensional meshes and blendshape texture maps. This process has a start 705 and an end 795, but may take place many times, may iteratively take place, for example, as additional pose data arrives, or upon receipt of additional frames of video…facial pose and expression data is received by the mobile device at 730. This data may be generated in real-time from a depth sensing camera or even from an optical camera that uses one or more methods (e.g. machine learning) to determine a pose and/or expression for a given user in real time).
Regarding claim 3, Li, in view of Sagar teaches the method according to claim 1 Sagar discloses wherein the base texture maps are learnt by performing mathematical calculations and without applying a machine learning process on texture maps corresponding to a frame capturing at least one expression of the plurality of expressions (Paragraph [0114]: To generate texture maps an end user provides to the system a series of blending weights 205 (wij, where i=1 2, . . . n and j=1, 2, . . . , l). The number of blending weights is the product of the number of digital characters and the number of layers that the texture maps are separated into. The weights are bounded to be between 0 and 1. The sum of the blending weights of all digital characters for the base layer (T11, T21, . . . Tm1) of the texture maps should be 1. This constraint is not required for the feature layers).
Regarding claim 4, Li, in view of Sagar teaches the method according to claim 1 Li discloses wherein the combination of the plurality of base texture maps comprises the plurality of base texture maps (Fig. 7; Fig. 8; Paragraph [0081]: inner mouths are also separately generated. The mouths are all generic to any individual, but are convincing. A total library of many mouths (e.g. 300 in the implementation by the inventors) is created with various geometric configurations of the mouth. Then, a synthesized mouth is created using a per-pixel weighted median of a large group of the closest mouths in similarity to the desired FACS expression. In this way, the mouth is wholly fictitious, but is convincingly like a real mouth and avoids tearing and other artifacts that are introduced by prior art methods reliant upon the actual content of the images which do not include an open mouth (as is the case in the neutral pose image used here). The mouth is then separately added to the mode).
Regarding claim 5, Li, in view of Sagar teaches the method according to claim 1 Li discloses wherein the combination of base texture maps comprises a subset of the plurality of base texture maps (Fig. 6; Paragraph [0070]: FIG. 6 is a set of activation masks for conversion of a key expression into a three-dimensional mesh and blendshape texture map combination. The associated key expressions for poses 601, 603, and 605 are shown in column 602. The activation masks to translate those into FACS expressions are visible in column 604. Only the portions of the expression that should be activated are highlighted. The darker portion of the activation mask is not included in the FACS expression. Using this process, the FACS expressions are generated based upon the key expressions. In the implementation by the inventors, a total of 36 FACS expressions were used, but more or fewer may be used depending on the particular implementation).
Regarding claim 6, Li, in view of Sagar teaches the method according to claim 1 Li discloses comprising selecting the subset of the base texture maps from the plurality of base texture maps (Fig. 6; Paragraph [0070]: FIG. 6 is a set of activation masks for conversion of a key expression into a three-dimensional mesh and blendshape texture map combination. The associated key expressions for poses 601, 603, and 605 are shown in column 602. The activation masks to translate those into FACS expressions are visible in column 604. Only the portions of the expression that should be activated are highlighted. The darker portion of the activation mask is not included in the FACS expression. Using this process, the FACS expressions are generated based upon the key expressions. In the implementation by the inventors, a total of 36 FACS expressions were used, but more or fewer may be used depending on the particular implementation).
Regarding claim 7, Li, in view of Sagar teaches the method according to claim 6 Li discloses wherein the selecting is based on weights of the base texture maps (Fig. 5; Paragraph [0066]: 560, the key expressions are used to generate FACS textures for later transmission to a mobile device (or subsequent low computation use by the same system) to generate real-time animated avatars. To perform this translation from key expression to FACS, for each texture map Te having each blendshape e∈ε, where ε is the set of blendshapes in the model, each of which correspond to a FACS action (e.g. an expression), a linear combination of the associated textures, weighted by the expression blendshape coefficients of the fitting, may be used to generate an output FACS texture).
Regarding claim 8, Li, in view of Sagar teaches the method according to claim 1 Li discloses wherein the generating of the regional texture map of the face of the person comprises calculating texture maps of different parts of the face of the person, and compiling the regional texture maps of different parts to the texture map of the face of the person (Fig. 8; Paragraphs [0046]-[0047]: FIG. 4 is a functional diagram 400 of a conditional generative adversarial network 410 used to generate key expressions. Given a neutral front input image I, an initial mesh is generated MI=(αI, βI, RI), where αI and βI are the respective identity and expression coefficients of the fitting to image I, and RI encodes the orientation parameters (e.g. rotation and translation of the mesh). From that, the face texture TI which is unwrapped form I to the UV-space (e.g. no shadows for depth in the texture itself, the shadows are introduced by the mesh itself later)…dataset with varying poses for faces (e.g. with up to 45 degrees of rotation in every direction from a neutral—face on—pose) is desirable to enable the GAN to create faces with corresponding ranges of pose).
Regarding claim 9, Li, in view of Sagar teaches the method according to claim 8 Li discloses wherein the determining and the calculating are executed per each one of the different parts of the face of the person (Fig. 8; Paragraph [0082]: FIG. 8 is a set 800 of examples of results for a desired facial expression generated from a single image compared to an actual image of the same expression for the same model. In columns 802 and 804, the input neutral image is seen for different faces 812, 814, 816 and 818. In columns 802′ and 804′ an example deformed neutral and gaze image are shown for two different poses per faces 812, 814, 816, and 818. In columns 802′″ and 804′″, a generated synthesized expression is shown for two different poses per faces 812, 814, 816, and 818. Finally, for comparison, an actual image of the individual making the desired expression is included in columns 802″″ and 804).
Regarding claim 10, Li, in view of Sagar teaches the method according to claim 1.
Li, in view of Sagar does not explicitly disclose wherein a number of base texture maps does not exceed 128.
However, Li teaches that any number may be used (Paragraph [0025]: source image 110 may come from a still camera or a video camera capturing an image. The source image 110 may be from a short term or long-term storage device holding data that represents images. For example, the source image 110 may come from a database of images, may be the Internet, or may be any number of other sources of image data). Examiner notes that it would have been obvious to try this number as choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success would be yielded. Examiner notes that: "The evaluation of the choices made by a skilled scientist, when such choices lead to the desired result, is a challenge to judicial understanding of how technical advance is achieved in the particular field of science or technology." Abbott Labs. v. Sandoz, Inc., 544 F.3d 1341, 1352, 89 USPQ2d 1161, 1171 (Fed. Cir. 2008). The Federal Circuit cautioned that an obviousness inquiry based on an obvious to try rationale must always be undertaken in the context of the subject matter in question, "including the characteristics of the science or technology, its state of advance, the nature of the known choices, the specificity or generality of the prior art, and the predictability of results in the area of interest." Thus, Examiner finds Li to indeed read on the claimed limitation.
Regarding claim 11, Li, in view of Sagar teaches the method according to claim 1.
Li, in view of Sagar does not explicitly disclose wherein a number of base texture maps does not exceed 51.
However, Li teaches that any number may be used (Paragraph [0025]: source image 110 may come from a still camera or a video camera capturing an image. The source image 110 may be from a short term or long-term storage device holding data that represents images. For example, the source image 110 may come from a database of images, may be the Internet, or may be any number of other sources of image data). Examiner notes that it would have been obvious to try this number as choosing from a finite number of identified, predictable solutions, with a reasonable expectation of success would be yielded. Examiner notes that: "The evaluation of the choices made by a skilled scientist, when such choices lead to the desired result, is a challenge to judicial understanding of how technical advance is achieved in the particular field of science or technology." Abbott Labs. v. Sandoz, Inc., 544 F.3d 1341, 1352, 89 USPQ2d 1161, 1171 (Fed. Cir. 2008). The Federal Circuit cautioned that an obviousness inquiry based on an obvious to try rationale must always be undertaken in the context of the subject matter in question, "including the characteristics of the science or technology, its state of advance, the nature of the known choices, the specificity or generality of the prior art, and the predictability of results in the area of interest." Thus, Examiner finds Li to indeed read on the claimed limitation.
Regarding claim 12, Li, in view of Sagar teaches the method according to claim 1 Li discloses wherein the rendering the avatar also uses a model of the person (Fig. 1; Paragraph [0024]: training data 105 is preferably a set of two-dimensional images of faces as well as fully modelled versions of the same faces including associated facial textures. The two-dimensional images and fully modelled and textured faces (typically captured using high-resolution camera rigs and infrared mapping) enable the generative portion of the generative adversarial network (“GAN”) to “learn” what typical face textures result from corresponding two-dimensional images. It also allows the discriminator portion of the generative adversarial network work with the generative to “knock out” or exclude faces that are inadequate or otherwise do not make the grade. If the training is good, over time, the GAN becomes better at creating realistic facial textures for each model (e.g. each expression) and the discriminator becomes more “fooled” by the real or fake determination for the resulting face).
Regarding claim 13, Li, in view of Sagar teaches the method according to claim 12, Sagar discloses wherein the model of the person is a template model (Paragraph [0131]: Application of digital makeup requires a high level of accuracy in the identification of pixel correspondences between the texture maps with and without makeup. Image registration algorithms, for example, optical flow or template matching, can be used to improve the accuracy of the point-to-point correspondence among texture maps of digital characters).
Regarding claim 14, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the non-transitory computer readable medium, which is disclosed by Li, Paragraph [0033]: stored software programs may include an application or “app” to cause the computing device to perform portions or all of the processes and functions described herein. The words “memory” and “storage”, as used herein, explicitly exclude transitory media including propagating waveforms and transitory signals); thus they are rejected on similar grounds.
Regarding claim 15, the limitations of this claim substantially correspond to the limitations of claim 2; thus they are rejected on similar grounds.
Regarding claim 16, the limitations of this claim substantially correspond to the limitations of claim 3; thus they are rejected on similar grounds.
Regarding claim 17, the limitations of this claim substantially correspond to the limitations of claim 4; thus they are rejected on similar grounds.
Regarding claim 18, the limitations of this claim substantially correspond to the limitations of claim 5; thus they are rejected on similar grounds.
Regarding claim 19, the limitations of this claim substantially correspond to the limitations of claim 6; thus they are rejected on similar grounds.
Regarding claim 20, the limitations of this claim substantially correspond to the limitations of claim 7; thus they are rejected on similar grounds.
Regarding claim 21, the limitations of this claim substantially correspond to the limitations of claim 8; thus they are rejected on similar grounds.
Regarding claim 22, the limitations of this claim substantially correspond to the limitations of claim 9; thus they are rejected on similar grounds.
Regarding claim 23, the limitations of this claim substantially correspond to the limitations of claim 10; thus they are rejected on similar grounds.
Regarding claim 24, the limitations of this claim substantially correspond to the limitations of claim 11; thus they are rejected on similar grounds.
Regarding claim 25, the limitations of this claim substantially correspond to the limitations of claim 12; thus they are rejected on similar grounds.
Regarding claim 26, the limitations of this claim substantially correspond to the limitations of claim 13; thus they are rejected on similar grounds.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW D SALVUCCI whose telephone number is (571)270-5748. The examiner can normally be reached M-F: 7:30-4:00PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XIAO WU can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MATTHEW SALVUCCI/Primary Examiner, Art Unit 2613