DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
1. This action is in response to the amendment filed on 10/7/2025. Claims 1-2, 6-10, 13, and 20 have been amended. Claims 3-5 have been cancelled. Claims 1-2 and 6-20 remain rejected in the application.
Response to Arguments
2. Applicant’s arguments with respect to claim 1 with respect to the rejection under 35 U.S.C. 103 regarding that the prior art does not teach the limitation(s): “wherein the base model comprises a plurality of facial expression models and a plurality of sets of predefined weights, each facial expression model being configured to define a facial expression of the virtual avatar, each predefined weight of each of the plurality of sets of predefined weights being applicable to configure one of the plurality of facial expression models, and each set of predefined weights being applicable to the plurality of facial expression models for determining a baseline facial expression” have been considered but are moot because of the new ground of rejection. The claims are now disclosed by Bhat, Beith, and Chung.
3. Regarding arguments to claims 2 and 6-20, they are dependent on independent claim 1. Applicant does not argue anything other than independent claim 1. The limitations in those claims, in conjunction with their combination, has previously been established and explained.
Claim Rejections - 35 USC § 103
4. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. Claims 1-2, 6, 13, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bhat et al. (US-10198845-B1, hereinafter "Bhat") in view of Beith et al. (US-2024/0078731-A1, hereinafter "Beith"), and further in view of Chung et al. (KR-2020-0019282-A, hereinafter "Chung").
6. As per claim 1, Bhat discloses: A computer-implemented method for controlling a virtual avatar on an electronic device, the method comprising: (Bhat, column 1, lines 15-19, “This invention relates to animating a 3D avatar based upon captured images. More particularly, this invention relates to animating facial expressions for a 3D avatar based on captured images of a user's face and their facial expressions.” and column 3, lines 59-63, “Users may use personal devices 180 ... to perform processes for capturing images (or video) of a user, identifying landmarks, and animating expressions for a 3D model in accordance with various embodiments of the invention.”)
providing a base model that defines a virtual avatar [[associated with a user profile]] corresponding to a user, (Bhat, column 1, lines 15-19, “This invention relates to animating a 3D avatar based upon captured images. More particularly, this invention relates to animating facial expressions for a 3D avatar based on captured images of a user's face and their facial expressions.” and column 10, lines 53-57, “In a number of embodiments, when no expression state can be identified, a base template can be used as an initial neutral frame until enough images are gathered to generate a better neutral frame for the user.”) wherein the base model comprises a plurality of facial expression models and a plurality of sets of predefined weights, each facial expression model being configured to define a facial expression of the virtual avatar, each predefined weight [[of each of the plurality of sets of predefined weights]] being applicable to configure one of the plurality of facial expression models, and each set of predefined weights being applicable to the plurality of facial expression models for determining a baseline facial expression; (Bhat, column 7, lines 46-55, “In some embodiments, morph target weighting generators use a combination of the predefined weights and the weights calculated by the numerical solver to determine a final set of weights for the morph targets. By classifying a user's expression and using predefined weights for various emotions, expressions of different 3D models (or avatars) can be standardized, allowing a designer to create morph targets of a model based on predictable weightings for various expressions, such as (but not limited to) a happy face, a surprised face, a sad face, etc.” and column 4, lines 49-58, “Mapping applications in accordance with several embodiments of the invention are used to animate expressions for 3D models using model parameters and/or weights that are calculated for morph targets of a 3D model. In some embodiments, the morph targets of a 3D model include a set of base shapes and a set of corrective shapes. Base shapes in accordance with several embodiments of the invention include linear base shapes that represent various action units for a facial model, such as those defined by the facial action coding system (FACS).” and column 6, lines 32-34, “Weights for the different morph targets are used to animate different facial expressions on a 3D model.”)
receiving input data from at least one of a plurality of multimedia input sources, the input data comprising images of a face of the user; (Bhat, column 4, lines 25-30, “Various components of a data processing element that executes one or more processes to provide an animated 3D model of a head in accordance with various embodiments of the invention are illustrated in FIG. 2. Data processing element 200 includes processor 205, image capture device 210, network interface 215, and memory 220.” and column 4, lines 42-44, “Image capture devices can include (but are not limited to) cameras and other sensors that can capture image data of a scene.”)
processing the input data; (Bhat, column 5, lines 16-23, “Image processing engines in accordance with many embodiments of the invention process images captured by an image capture device to perform a variety of functions, including (but not limited to) landmark identification, camera parameter identification, and image preprocessing. In this example, image processing engine 305 includes landmark engine 307, which can be used to detect and track landmarks from captured images.”)
determining a baseline facial expression of the virtual avatar (Bhat, column 5, line 59-column 6, line 5, “The neutral frame engine 310 in accordance with many embodiments of the invention internally manages a neutral frame estimate (in 3D) that approximates the shape of the user's face in a neutral expression. In some embodiments, neutral models of a user's face are used to define a neutral state, where other facial expressions can be measured against the neutral state to classify the other facial expressions. Neutral frame engines in accordance with some embodiments of the invention can use any of a number of approaches to classify a neutral expression, including (but not limited to) using an identity solver from 2D and/or 3D landmarks, using statistics based on ratios of 3D face distances, and maintaining an adaptive neutral geometry in 3D by temporally accumulating data across multiple frames.”) and a dynamic facial expression of the virtual avatar using the processed input data; (Bhat, column 8, lines 22-26, “Process 700 identifies (710) landmarks from the received images. The identification of certain features or landmarks of a face can be useful in a variety of applications including (but not limited to) face detection, face recognition, and computer animation.” and column 8, lines 56-57, “Process 700 classifies (725) the landmarks as a facial expression.” and column 9, lines 4-5, “Process 700 identifies (730) predefined expression weights for the classified expression.” and column 9, lines 12-13, “Process 700 calculates (735) a final set of weights for the morph targets to animate a 3D model.”)
generating an output facial expression of the virtual avatar based on the determined baseline facial expression of the virtual avatar and the determined dynamic facial expression of the virtual avatar; (Bhat column 9, lines 12-13, “Process 700 calculates (735) a final set of weights for the morph targets to animate a 3D model.” and column 9, lines 27-35, “In some embodiments, function curves are used to control the rate of blending weights to morph between different expressions. Function curves in accordance with a number of embodiments of the invention can be used to ensure smooth blending of the morph target weights between detected expressions. In certain embodiments, it can be desirable to provide a fast ramp in of a current detected expression, and a fast blend out of the previous expression (or neutral).”)
updating the base model using the output facial expression of the virtual avatar so as to update at least one property of the virtual avatar; and (Bhat, column 1, lines 33-51, “Systems and methods for generating animations for a 3D model ... wherein the 3D model is animated based on the calculated set of final morph target weights for morph targets of the 3D model.” and column 7, lines 46-55, “In some embodiments, morph target weighting generators use a combination of the predefined weights and the weights calculated by the numerical solver to determine a final set of weights for the morph targets. By classifying a user's expression and using predefined weights for various emotions, expressions of different 3D models (or avatars) can be standardized, allowing a designer to create morph targets of a model based on predictable weightings for various expressions, such as (but not limited to) a happy face, a surprised face, a sad face, etc.”)
rendering the updated base model to display the virtual avatar on a display screen. (Bhat, Figs. 1, 13-19; column 3, lines 59-63, “Users may use personal devices 180 ... to perform processes for capturing images (or video) of a user, identifying landmarks, and animating expressions for a 3D model in accordance with various embodiments of the invention.” and column 3, line 66-column 4, line 1, “However, the personal device 180 may be a desktop computer, a laptop computer, a smart television, ...” and column 4, lines 8-10, “However, mobile device 120 may be a mobile phone, Personal Digital Assistant (PDA), a tablet, a smartphone, ...”)
7. Bhat doesn't explicitly disclose but Beith discloses: [[providing a base model that defines a virtual avatar]] associated with a user profile [[corresponding to a user,]] (Beith, Fig. 9; page 2, ¶ [0018], “FIG. 9 is a block diagram of a particular illustrative aspect of components that can be included in a system configured to generate adjusted face data corresponding to an avatar facial expression in conjunction with a user profile, in accordance with some examples of the present disclosure.”)
8. Beith is analogous art with respect to Bhat because they are from the same field of endeavor, namely generating emotive facial avatars. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include associating a virtual avatar to a user profile, as taught by Beith into the teaching of Bhat. The suggestion for doing so would provide a user with an ability to customize a unique avatar on their user account and save settings for future use. Therefore, it would have been obvious to combine Beith with Bhat.
9. Bhat in view of Beith doesn't explicitly disclose but Chung discloses: [[wherein the base model comprises a plurality of facial expression models and a plurality of sets of predefined weights, each facial expression model being configured to define a facial expression of the virtual avatar, each predefined weight]] of each of the plurality of sets of predefined weights [[being applicable to configure one of the plurality of facial expression models, and each set of predefined weights being applicable to the plurality of facial expression models for determining a baseline facial expression;]] (Chung, Fig. 1, 6, and 8; page 6, [0019], “Figure 1 illustrates a plurality of blend shape vectors employed in a blend shape technique and a control weight multiplied by each of the plurality of blend shape vectors. Referring to Figure 1, F1 to F5 are blend shape vectors (or expression vectors) each representing a reference expression (or representative expression). Additionally, w1 to w5 are control weights that are multiplied by each blend shape vector. The formula in Fig. 1 shows that various expressions (five faces shown in Fig. 1) can be expressed in addition to the reference expression (or representative expression) by multiplying a control weight to each of a plurality of blend shape vectors and adding them together.” and page 8, [0047]-[0049], “Looking more specifically, the emotional rate (rate) of happiness in various facial images of person A has values greater than or equal to 0 and less than or equal to 1, which are 0, 0.2, 0.5, 0.7, and 1. For each emotional ratio, adjustment weights for multiple standard blend shape vectors are mapped. If a certain value of the emotional ratio, for example 0.4, is left empty, the emotional ratio of this empty part and the adjustment weights that map to the emotional ratio of the empty part can be generated by interpolation. Meanwhile, the 'range (min, max) for the location of each feature point' shown in Fig. 8 will be described later. At this time, in the storage unit (120), in addition to 'happiness', a table of facial expression-related information for various emotions, such as sadness, joy, surprise, or boredom, for person A may be stored in the form of FIG. 8. In addition, the storage unit (120) may store a table of facial expression-related information for each emotion, such as happiness, sadness, joy, or surprise, for people other than A.” and page 8, [0052], “Here, standard blend shape vectors are used to generate facial expressions, but the adjustment weights multiplied to each standard blend shape vector are based on the actual facial expressions of each user. That is, according to one embodiment, facial expressions tailored to each individual can be implemented through adjustment weights that reflect each individual's actual facial expressions without the process of optimizing the standard blend shape vector for each individual. Therefore, it is possible to provide technology that can express each individual's diverse facial expressions with sufficient accuracy and detail even on devices with relatively few resources, that is, low specifications.”)
PNG
media_image1.png
547
1389
media_image1.png
Greyscale
Chung, Figure 1: This figure demonstrates how a plurality of weights and facial expressions can be combined to create a new facial expression.
10. Chung is analogous art with respect to Bhat in view of Beith because they are from the same field of endeavor, namely generating facial expressions. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include that each predefined weight of each of the plurality of sets of predefined weights to configure one of the plurality of facial expression models, as taught by Chung into the teaching of Bhat in view of Beith. The suggestion for doing so would allow for each weight to contribute to an overall facial expression. This allows pre-defined facial expressions to be given a weighted value as to the degree in which the expression has an impact on the final expression. The combination of these weighted expressions, as disclosed in Chung’s Figure 1 for example, incorporates them together to generate a final facial expression. This provides the ability to account for subtle adjustments in facial expressions, such as add a little bit of “happiness” to a “surprised” emotion. Therefore, it would have been obvious to combine Chung with Bhat in view of Beith.
11. As per claim 2, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 1, wherein the plurality of input sources comprises an imaging source configured to provide images of the user's face. (Bhat, column 4, lines 25-30, “Various components of a data processing element that executes one or more processes to provide an animated 3D model of a head in accordance with various embodiments of the invention are illustrated in FIG. 2. Data processing element 200 includes processor 205, image capture device 210, network interface 215, and memory 220.” and column 4, lines 42-44, “Image capture devices can include (but are not limited to) cameras and other sensors that can capture image data of a scene.”)
12. As per claim 6, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 1, wherein the plurality of facial expression models comprise a plurality of blend shapes, each blend shape defining a different portion of a face mesh. (Bhat, column 6, lines 56-61, “In accordance with many embodiments, the Facial Action Coding System (FACS) morph targets may be constructed by an artist using 3D modelling tools. In several embodiments, the FACS blend shapes may be synthesized using muscle-based physical simulation systems.” and column 4, lines 49-61, “Mapping applications in accordance with several embodiments of the invention are used to animate expressions for 3D models using model parameters and/or weights that are calculated for morph targets of a 3D model. In some embodiments, the morph targets of a 3D model include a set of base shapes and a set of corrective shapes. Base shapes in accordance with several embodiments of the invention include linear base shapes that represent various action units for a facial model, such as those defined by the facial action coding system (FACS). Corrective shapes in accordance with certain embodiments of the invention are a non-linear function of two or more base shapes used to represent a combined state of multiple action units.”)
13. As per claim 13, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 2, wherein if the imaging source stops providing images for at least a period of time, the method comprises:
determining an idle facial expression; and (Bhat, column 10, lines 53-57 “In a number of embodiments, when no expression state can be identified, a base template can be used as an initial neutral frame until enough images are gathered to generate a better neutral frame for the user.” and column 9, lines 63-66, “The framework in accordance with a number of embodiments of the invention builds an internal 3D representation of the users face in neutral expression (or neutral frame) by combining facial landmarks across multiple frames of video.”)
updating the base model by adding the idle facial expression to the base model. (Bhat, column 9, lines 63-66, “The framework in accordance with a number of embodiments of the invention builds an internal 3D representation of the users face in neutral expression (or neutral frame) by combining facial landmarks across multiple frames of video.” and column 9, line 67-column 10, line 2, “In several embodiments, neutral frames are computed by a numerical solver and are continually updated over time with additional frames of video.” and column 10, lines 58-60, “In many embodiments, neutral frames are continuously updated as more images of a user are captured in order to reach a better neutral frame.”)
14. As per claim 16, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 2, wherein processing the input data comprises applying facial tracking to the images captured by the imaging source to construct a 3D mesh. (Bhat, column 5, lines 16-26, “Image processing engines in accordance with many embodiments of the invention process images captured by an image capture device to perform a variety of functions, including (but not limited to) landmark identification, camera parameter identification, and image preprocessing. In this example, image processing engine 305 includes landmark engine 307, which can be used to detect and track landmarks from captured images. The motion of one or more landmarks and/or 3D shapes in visible video can be tracked, and the expressions of the 3D model video can be recomputed based on the tracked landmarks.” and column 8, lines 22-34, “Process 700 identifies (710) landmarks from the received images. The identification of certain features or landmarks of a face can be useful in a variety of applications including (but not limited to) face detection, face recognition, and computer animation. In many embodiments, the identification of landmarks includes the identification of 3D points of a user's face, which may aid in animation and/or modification of a face in a 3D model. In accordance with some embodiments of an invention, a Mnemonic Descent Method (MDM) is used for facial landmark tracking. The goal of tracking the facial landmarks is to predict a set of points on an image of a face that locate salient features (such as eyes, lip corners, jawline, etc.).”)
15. As per claim 17, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 1, wherein the plurality of multimedia input sources further comprises one or more of:
an audio input configured to capture audio from a user; (Beith, page 9, ¶ [0105], “However, the facial expressions for each of the avatars may be based on audio input from the users as described above.”)
a user input device or user interface device; (Bhat, column 4, lines 25-30, “Various components of a data processing element that executes one or more processes to provide an animated 3D model of a head in accordance with various embodiments of the invention are illustrated in FIG. 2. Data processing element 200 includes processor 205, image capture device 210, network interface 215, and memory 220.” and column 4, lines 42-44, “Image capture devices can include (but are not limited to) cameras and other sensors that can capture image data of a scene.”)
a user electronic device or a network connection to an electronic device; (Bhat, column 3, lines 59-63, “Users may use personal devices 180 and 120 that connect to the network 160 to perform processes for capturing images (or video) of a user, identifying landmarks, and animating expressions for a 3D model in accordance with various embodiments of the invention.” and column 3, lines 34-45, “A system that provides animation of a 3D model of a head from received images in accordance with some embodiments of the invention is shown in FIG. 1. ... For purposes of this discussion, cloud services are one or more applications that are executed by one or more server systems to provide data and/or executable applications to devices over a network.”)
a game or an application executed on an electronic device; (Bhat, column 7, lines 65-67, “In accordance with several of these embodiments, an application executed by the user device controls the image capture device to capture the image data.”)
and/or an AI, or game AI.
16. Beith is analogous art with respect to Bhat in view of Chung because they are from the same field of endeavor, namely generating emotive facial avatars. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include an audio input configured to capture audio from a user, as taught by Beith into the teaching of Bhat in view of Chung. The suggestion for doing so would provide an additional input from the user so that when a user speaks, the voice of the user can be further used to refine the facial animation. In addition, this also provides further context to other users watching the animation as to what is being said instead of just a mouth moving. Therefore, it would have been obvious to combine Beith with Bhat in view of Chung.
17. As per claim 18, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 1, wherein the plurality of multimedia input sources comprises a memory, (Bhat, column 4, lines 25-30, “Various components of a data processing element that executes one or more processes to provide an animated 3D model of a head in accordance with various embodiments of the invention are illustrated in FIG. 2. Data processing element 200 includes processor 205, image capture device 210, network interface 215, and memory 220.”) the memory comprising data related to the virtual avatar, or to at least one previous version of the virtual avatar, associated with the user profile;
the method further comprising storing in the memory the updated base model and/or data defining the updated base model; (Bhat, column 4, lines 49-61, “Mapping applications in accordance with several embodiments of the invention are used to animate expressions for 3D models using model parameters and/or weights that are calculated for morph targets of a 3D model. In some embodiments, the morph targets of a 3D model include a set of base shapes and a set of corrective shapes. Base shapes in accordance with several embodiments of the invention include linear base shapes that represent various action units for a facial model, such as those defined by the facial action coding system (FACS). Corrective shapes in accordance with certain embodiments of the invention are a non-linear function of two or more base shapes used to represent a combined state of multiple action units.”
and/or at least a portion of the input data, or processed input data. (Bhat, column 7, line 60-column 8, line 2, “Process 700 receives (705) images of a user's face. In certain embodiments, images of a user are captured with an image capture device (e.g., a camera) of the user's device. In accordance with several of these embodiments, an application executed by the user device controls the image capture device to capture the image data. In accordance with some embodiments, images are read from memory.”)
18. As per claim 19, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 1, wherein the plurality of input sources further comprises an audio input configured to capture audio from the user; and (Beith, page 9, ¶ [0105], “However, the facial expressions for each of the avatars may be based on audio input from the users as described above.”)
wherein processing the input data comprises determining the volume of the audio captured by the audio input; (Beith, page 9, ¶ [0106], “Context and volume of the voice, the emotional response, or both, exhibited in the audio data 204 are examples of information that can be used to determine the magnitude of the expression portrayed by the avatar 154. For example, a loud laugh of the user 108 can result in the avatar 154 displaying a large, open mouth, and other facial aspects related to a boisterous laugh may also be increased.”)
and/or wherein the plurality of input sources further comprises a user interface device, and the method comprises:
receiving a user input from the user interface device; (Bhat, column 4, lines 25-30, “Various components of a data processing element that executes one or more processes to provide an animated 3D model of a head in accordance with various embodiments of the invention are illustrated in FIG. 2. Data processing element 200 includes processor 205, image capture device 210, network interface 215, and memory 220.” and column 4, lines 42-44, “Image capture devices can include (but are not limited to) cameras and other sensors that can capture image data of a scene.”)
and/or wherein the input data comprises gameplay data from a game the user is playing on the electronic device;
and/or wherein the input data comprises gameplay data from a game the user is playing on another electronic device which is in communication with the electronic device. (Beith, page 21, ¶ [0233], “The assistant operations can include initiating or joining an online activity with one or more other participants, such as an online game or virtual conference, in which the user is represented by the avatar 154. For example, the wireless speaker and voice activated device 3902 may send the representation 152 of the avatar 154, the audio output 2340, or both, to another device (e.g., a gaming server) that can include the avatar 154 in a virtual setting that is shared by the other participants.” And page 25, ¶ [0271], “In a particular implementation, each of the display 4828, the input device 4830, the one or more speakers 2302, the one or more microphones 202, the one or more cameras 206, the one or more motion sensors 210, the antenna 4852, and the power supply 4844 may be coupled to a component of the system-on-chip device 4822, such as an interface or a controller.”)
19. Beith is analogous art with respect to Bhat in view of Chung because they are from the same field of endeavor, namely generating emotive facial avatars. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include adding an audio input that determines the volume of the audio, as taught by Beith into the teaching of Bhat in view of Chung. Additionally, it would have been obvious to include using input data as gameplay data for a game and to communicate between two or more electronic devices (such as game devices), also taught by Beith. Regarding the audio volume, the suggestion to include this teaching would provide an audio input into the electronic device that could determine to what extreme an emotion on an avatar should be expressed, such as loud volume might equate to a more expressive avatar. Additionally, the teaching of gameplay data communicating between electronic devices would allow users to interact with each other in a gameplay environment and share emotive avatars to show how they are feeling while playing the game. Therefore, it would have been obvious to combine Beith with Bhat in view of Chung.
20. As per claim 20, Bhat in view of Beith, and further in view of Chung discloses: An electronic device configured to carry out the method of claim 1, wherein the electronic device is a smartphone and the smartphone comprises at least one of the plurality of input sources. (Bhat, column 3, line 66-column 4, line 1, “However, the personal device 180 may be a desktop computer, a laptop computer, a smart television, ...” and column 4, lines 8-10, “However, mobile device 120 may be a mobile phone, Personal Digital Assistant (PDA), a tablet, a smartphone, ...” and column 4, lines 25-30, “Various components of a data processing element that executes one or more processes to provide an animated 3D model of a head in accordance with various embodiments of the invention are illustrated in FIG. 2. Data processing element 200 includes processor 205, image capture device 210, network interface 215, and memory 220.” and column 4, lines 42-44, “Image capture devices can include (but are not limited to) cameras and other sensors that can capture image data of a scene.”)
21. Claims 7-8, 10-11, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Bhat et al. (US-10198845-B1, hereinafter "Bhat") in view of Beith et al. (US-2024/0078731-A1, hereinafter "Beith"), further in view of Chung et al. (KR-2020-0019282-A, hereinafter "Chung"), and further in view of Tomite et al. (JP-6207210-B2, hereinafter "Tomite").
22. As per claim 7, Bhat in view of Beith, and further in view of Chung discloses: A computer-implemented method as claimed in claim 1, wherein determining the baseline facial expression comprises:
determining a set of predefined weights among the plurality of sets of predefined weights using the processed input data; and (Bhat, column 9, lines 4-5, “Process 700 identifies (730) predefined expression weights for the classified expression.” and column 6, lines 30-32, “Weighting engines in accordance with a number of embodiments calculate the weights based on results from image processing engines and/or neutral frame engines.” and column 6, line 62-column 7, line 10, “In the example of FIG. 5, numerical solver 510 includes facial geometry solver 512 and corrective interpolator 514. Facial geometry solvers can use identified landmarks to determine the facial geometry of a user and to determine their relative positions in order to identify weights for morph targets. In many embodiments, numerical solvers operate in a multi-stage operation that uses landmarks from images to first identify a neutral frame (or an identity-basis) for a user and to then calculate a differential for subsequent images based on the neutral frame to classify expressions from the user's face. In several embodiments, numerical solvers use a neutral state shape in the optimization and update the neural state shape as the system sees more images of the users face in a video sequence. Neutral shapes can be used in conjunction with subsequent images of a user's face to more accurately classify a user's expression.”)
generating the baseline facial expression by [[multiplying each weight]] of the set of predefined weights with its corresponding facial expression model to generate a weighted baseline facial expression model; and (Bhat column 9, lines 12-13, “Process 700 calculates (735) a final set of weights for the morph targets to animate a 3D model.” and column 9, lines 27-35, “In some embodiments, function curves are used to control the rate of blending weights to morph between different expressions. Function curves in accordance with a number of embodiments of the invention can be used to ensure smooth blending of the morph target weights between detected expressions. In certain embodiments, it can be desirable to provide a fast ramp in of a current detected expression, and a fast blend out of the previous expression (or neutral).”)
combining all of weighted baseline facial expression models. (Bhat, column 7, lines 24-42, “In several embodiments, solved facial geometries and/or morph target weights from a numerical solver are used by expression classifiers to classify the facial geometry into an expression state. In some embodiments, this is implemented with an expression state machine. A conceptual diagram of an expression state machine for classifying expressions of a user from captured images of the user in accordance with an embodiment of the invention is illustrated in FIG. 6. Expression states in accordance with some embodiments of the invention include the universal expressions (happy, sad, angry, etc). In certain embodiments, expression states include two special expression states: “neutral” and “unknown”, which can be very helpful in capturing the facial state when a person is talking without a strong facial expression, or when the user is making a face that cannot be classified into a traditional expression. The expression classification can be achieved by analyzing the morph target weights directly, or in combination with image and landmark inputs using machine learning techniques.”)
23. Bhat in view of Beith, and further in view of Chung doesn't explicitly disclose but Tomite discloses: [[generating the baseline facial expression by]] multiplying each weight [[of the set of predefined weights with its corresponding facial expression model to generate a weighted baseline facial expression model; and]] (Tomite, page 8, ¶ [0054], “Me · GM representing expressive general face model can be represented by weighted linear sum obtained by multiplying n target shapes T k by coefficients of weight w k. In the target shape T k, coordinate values of predefined vertex groups are stored.”)
24. Tomite is analogous art with respect to Bhat in view of Beith, and further in view of Chung because they are from the same field of endeavor, namely generating emotive facial models. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include multiply each weight, as taught by Tomite into the teaching of Bhat in view of Beith, and further in view of Chung. The suggestion for doing so would provide a method to factor in the weights to determine the degree of facial expressions into creating a neutral, idle, and/or baseline avatar. Therefore, it would have been obvious to combine Tomite with Bhat in view of Beith, and further in view of Chung.
25. As per claim 8, Bhat in view of Beith, further in view of Chung, and further in view of Tomite discloses: A computer-implemented method as claimed in claim 1, wherein determining the dynamic facial expression comprises:
determining a set of dynamic weights using the images of the face of the user, each dynamic weight being applicable to configure one of the plurality of facial expression models; and (Bhat, column 8, lines 22-26, “Process 700 identifies (710) landmarks from the received images. The identification of certain features or landmarks of a face can be useful in a variety of applications including (but not limited to) face detection, face recognition, and computer animation.” and column 8, lines 56-57, “Process 700 classifies (725) the landmarks as a facial expression.” and column 9, lines 4-5, “Process 700 identifies (730) predefined expression weights for the classified expression.” and column 9, lines 12-13, “Process 700 calculates (735) a final set of weights for the morph targets to animate a 3D model.”)
generating the dynamic facial expression by multiplying each weight of the set of dynamic weights with its corresponding facial expression model to generate a weighted dynamic facial expression model; and (Tomite, page 8, ¶ [0054], “Me · GM representing expressive general face model can be represented by weighted linear sum obtained by multiplying n target shapes T k by coefficients of weight w k. In the target shape T k, coordinate values of predefined vertex groups are stored.” and page 8, ¶ [0081]-[0082], “That is, as shown in Expression (2), an expression-added general-purpose face model Me (Me) with an individual difference canceled by multiplying the facial expression individual face model (Characterized Expression GM) by the inverse matrix ... of the specific individual's skeleton model ... The variation model calculation unit 503 calculates a variation model with the target shape weight w as a parameter for the expression-added general purpose face model Me · GM generated by the framework model application unit 502.")
combining all of weighted dynamic facial expression models. (Bhat, column 9, lines 12-13, “Process 700 calculates (735) a final set of weights for the morph targets to animate a 3D model.”)
26. Tomite is analogous art with respect to Bhat in view of Beith, and further in view of Chung because they are from the same field of endeavor, namely generating emotive facial models. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include multiply each weight, as taught by Tomite into the teaching of Bhat in view of Beith, and further in view of Chung. The suggestion for doing so would provide a method to factor in the weights to determine the degree of facial expressions into creating a dynamic avatar. Therefore, it would have been obvious to combine Tomite with Bhat in view of Beith, and further in view of Chung.
27. As per claim 10, Bhat in view of Beith, further in view of Chung, and further in view of Tomite discloses: A computer-implemented method as claimed in claim 8, wherein the set of dynamic weights and the first and second output weights are determined by an artificial neural network (ANN), wherein the ANN is configured to:
receive at least a portion of the input data and/or the processed input data, and in response to the portion of the input data and/or the processed input data received, output desired data or instructions. (Bhat, column 8, line 56-column 9, line 3, “Process 700 classifies (725) the landmarks as a facial expression. Classifying the landmarks in accordance with many embodiments of the invention include classifying the landmarks as an indication of a particular emotion, such as (but not limited to) happiness, sadness, surprise, and anger. In several embodiments, classification is performed based on differentials calculated based on differentials between a user's neutral frame and the landmarks from a current image of the user's face. Classification as emotions and/or other facial expressions can be performed using a variety of machine learning techniques, including, but not limited to, convolutional neural networks, support vector machines, and decision trees. In several embodiments, expressions can be obtained by classifying 3D morph target weights produced by a numerical solver.”)
28. As per claim 11, Bhat in view of Beith, further in view of Chung, and further in view of Tomite discloses: A computer-implemented method as claimed in claim 8, wherein determining the first and second output weights comprises:
providing a plurality of pairs of first output weight and second output weight, each of the plurality of pairs of first output weight and second output weight being associated with one of a plurality of predefined emotions; (Bhat, column 2, lines 19-25, “In another embodiment again, the instructions further direct the set of processors to calculate a second set of final morph target weights based on a third set of one or more images, wherein the transition between the first set of final morph target weights and the second set of final morph target weights is based on a linear function to control the rate of morphing between different expressions.”)
determining an emotion using the processed input data; and (Bhat, column 3, lines 16-21, “Systems and processes in accordance with many embodiments of the invention provide a process for identifying landmarks from captured images of a user's face to calculate weights for morph targets of a 3D model in order to animate the expressions of the 3D model.”)
determining a pair of first output weight and second output weight from the plurality of pairs of first output weight and second output weight by mapping the determined emotion to the plurality of predefined emotions. (Bhat, column 2, lines 19-25, “In another embodiment again, the instructions further direct the set of processors to calculate a second set of final morph target weights based on a third set of one or more images, wherein the transition between the first set of final morph target weights and the second set of final morph target weights is based on a linear function to control the rate of morphing between different expressions.” and column 6, lines 37-43, “Mapping engines in accordance with a number of embodiments of the invention can animate expressions of a 3D model by morphing between the morph targets of a 3D model based on the weights calculated by weighting engines. In some embodiments, the morphing is a linear combination of the morph targets with their corresponding weights.” and column 7, lines 49-57, “By classifying a user's expression and using predefined weights for various emotions, expressions of different 3D models (or avatars) can be standardized, allowing a designer to create morph targets of a model based on predictable weightings for various expressions, such as (but not limited to) a happy face, a surprised face, a sad face, etc. Each expression may include multiple weights for various different morph targets in a 3D model.”)
29. As per claim 14, Bhat in view of Beith, further in view of Chung, and further in view of Tomite discloses: A computer-implemented method as claimed in claim 13, wherein determining the idle facial expression comprises:
determining a set of idle weights, each idle weight being applicable to configure one of the plurality of facial expression models; and (Bhat, column 6, lines 30-32, “Weighting engines in accordance with a number of embodiments calculate the weights based on results from image processing engines and/or neutral frame engines.” and column 7, lines 24-42, “In several embodiments, solved facial geometries and/or morph target weights from a numerical solver are used by expression classifiers to classify the facial geometry into an expression state. In some embodiments, this is implemented with an expression state machine. A conceptual diagram of an expression state machine for classifying expressions of a user from captured images of the user in accordance with an embodiment of the invention is illustrated in FIG. 6. Expression states in accordance with some embodiments of the invention include the universal expressions (happy, sad, angry, etc). In certain embodiments, expression states include two special expression states: “neutral” and “unknown”, which can be very helpful in capturing the facial state when a person is talking without a strong facial expression, or when the user is making a face that cannot be classified into a traditional expression. The expression classification can be achieved by analyzing the morph target weights directly, or in combination with image and landmark inputs using machine learning techniques.”)
generating an idle facial expression by multiplying each weight of the set of idle weights with its corresponding facial expression model to generate a weighted idle facial expression model and then combining all of weighted idle facial expression models. (Tomite, page 8, ¶ [0054], “Me · GM representing expressive general face model can be represented by weighted linear sum obtained by multiplying n target shapes T k by coefficients of weight w k. In the target shape T k, coordinate values of predefined vertex groups are stored.”)
See claim 7 rejection for reason to combine: multiplying each weight.
30. As per claim 15, Bhat in view of Beith, further in view of Chung, and further in view of Tomite discloses: A computer-implemented method as claimed in claim 14, wherein the set of idle weights is one of the plurality of sets of predefined weights. (Bhat, column 6, lines 30-32, “Weighting engines in accordance with a number of embodiments calculate the weights based on results from image processing engines and/or neutral frame engines.” and column 7, lines 24-42, “In several embodiments, solved facial geometries and/or morph target weights from a numerical solver are used by expression classifiers to classify the facial geometry into an expression state. In some embodiments, this is implemented with an expression state machine. A conceptual diagram of an expression state machine for classifying expressions of a user from captured images of the user in accordance with an embodiment of the invention is illustrated in FIG. 6. Expression states in accordance with some embodiments of the invention include the universal expressions (happy, sad, angry, etc). In certain embodiments, expression states include two special expression states: “neutral” and “unknown”, which can be very helpful in capturing the facial state when a person is talking without a strong facial expression, or when the user is making a face that cannot be classified into a traditional expression. The expression classification can be achieved by analyzing the morph target weights directly, or in combination with image and landmark inputs using machine learning techniques.”)
31. Claims 9 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Bhat et al. (US-10198845-B1, hereinafter "Bhat") in view of Beith et al. (US-2024/0078731-A1, hereinafter "Beith"), further in view of Chung et al. (KR-2020-0019282-A, hereinafter "Chung"), further in view of Tomite et al. (JP-6207210-B2, hereinafter "Tomite"), and further in view of Hong et al. (TW-I736054-B, hereinafter "Hong").
32. As per claim 9, Bhat in view of Beith, further in view of Chung, and further in view of Tomite discloses: A computer-implemented method as claimed in claim 1, wherein generating the output facial expression of the virtual avatar comprises:
determining a first output weight and a second output weight; (Bhat column 2, lines 19-25, “In another embodiment again, the instructions further direct the set of processors to calculate a second set of final morph target weights based on a third set of one or more images, wherein the transition between the first set of final morph target weights and the second set of final morph target weights is based on a linear function to control the rate of morphing between different expressions.”)
[[generating a set of average output weights by:]]
multiplying each weight of a set of predefined weights with a first output weight to generate a modified first output weight; (Tomite, page 8, ¶ [0054], “Me · GM representing expressive general face model can be represented by weighted linear sum obtained by multiplying n target shapes T k by coefficients of weight w k. In the target shape T k, coordinate values of predefined vertex groups are stored.” and page 8, ¶ [0081]-[0082], “That is, as shown in Expression (2), an expression-added general-purpose face model Me (Me) with an individual difference canceled by multiplying the facial expression individual face model (Characterized Expression GM) by the inverse matrix ... of the specific individual's skeleton model ... The variation model calculation unit 503 calculates a variation model with the target shape weight w as a parameter for the expression-added general purpose face model Me · GM generated by the framework model application unit 502.")
multiplying each weight of the set of dynamic weights with a second output weight to generate a modified second output weight; (Tomite, page 8, ¶ [0054], “Me · GM representing expressive general face model can be represented by weighted linear sum obtained by multiplying n target shapes T k by coefficients of weight w k. In the target shape T k, coordinate values of predefined vertex groups are stored.” and page 8, ¶ [0081]-[0082], “That is, as shown in Expression (2), an expression-added general-purpose face model Me (Me) with an individual difference canceled by multiplying the facial expression individual face model (Characterized Expression GM) by the inverse matrix ... of the specific individual's skeleton model ... The variation model calculation unit 503 calculates a variation model with the target shape weight w as a parameter for the expression-added general purpose face model Me · GM generated by the framework model application unit 502.")
[[adding each modified first output weight and a corresponding modified second output weight to generate an average output weight; and]]
generating the output facial expression by multiplying each weight [[of the set of average output weights with its corresponding facial expression model to generate a weighted average facial expression model and then combining all of weighted average facial expression models.]] (Bhat, column 1, lines 15-19, “This invention relates to animating a 3D avatar based upon captured images. More particularly, this invention relates to animating facial expressions for a 3D avatar based on captured images of a user's face and their facial expressions.” and Tomite, page 8, ¶ [0054], “Me · GM representing expressive general face model can be represented by weighted linear sum obtained by multiplying n target shapes T k by coefficients of weight w k. In the target shape T k, coordinate values of predefined vertex groups are stored.” and page 8, ¶ [0081]-[0082], “That is, as shown in Expression (2), an expression-added general-purpose face model Me (Me) with an individual difference canceled by multiplying the facial expression individual face model (Characterized Expression GM) by the inverse matrix ... of the specific individual's skeleton model ... The variation model calculation unit 503 calculates a variation model with the target shape weight w as a parameter for the expression-added general purpose face model Me · GM generated by the framework model application unit 502.")
See claim 8 rejection for reason to combine: multiplying each weight.
33. Bhat in view of Beith, further in view of Chung, and further in view of Tomite doesn't explicitly disclose but Hong discloses:
generating a set of average output weights by:
adding each modified first output weight and a corresponding modified second output weight to generate an average output weight; and (Hong, page 11, ¶ [0035], “In one embodiment, the processor 150 may combine the first emotion configuration and the second emotion configuration to generate one or more emotion combinations. The method of combining the two emotion configurations may be: determining the sum or weighted average of the parameters of the two emotion configurations, or directly using some of the parameters of the two emotion configurations as the parameters of one emotion configuration.”)
[[generating the output facial expression by multiplying each weight]] of the set of average output weights with its corresponding facial expression model to generate a weighted average facial expression model and then combining all of weighted average facial expression models. (Hong, page 11, ¶ [0035]-[0036], “In one embodiment, the processor 150 may combine the first emotion configuration and the second emotion configuration to generate one or more emotion combinations. The method of combining the two emotion configurations may be: determining the sum or weighted average of the parameters of the two emotion configurations, or directly using some of the parameters of the two emotion configurations as the parameters of one emotion configuration. In one embodiment, the emotion configuration specifically corresponds to only one facial expression, and in the period of the third period, the processor 150 can compare geometric parameters and/or textures corresponding to the first emotion configuration and the second emotion configuration. The parameters are averaged or the weighted relationship of the parameters of the two emotion configurations is given to adjust the facial expression of the avatar, and the result of the average or the result of the weight calculation will become a combination of emotions.”)
34. Hong is analogous art with respect to Bhat in view of Beith, further in view of Chung, and further in view of Tomite because they are from the same field of endeavor, namely generating emotive facial avatars. At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include averaging output weights and using those averaged weights to generate an output facial expression based on those average weights, as taught by Hong into the teaching of Bhat in view of Beith, further in view of Chung, and further in view of Tomite. The suggestion for doing so would provide a means to use weights from two different expressions and produce an “average” expression that contains aspects of both expressions. Therefore, it would have been obvious to combine Hong with Bhat in view of Beith, further in view of Chung, and further in view of Tomite.
35. As per claim 12, Bhat in view of Beith, further in view of Chung, further in view of Tomite, and further in view of Hong discloses: A computer-implemented method as claimed in claim 9, wherein the first output weight and the second output weight are set by the user. (Bhat, column 14, line 65-column 15, line 4, “The modified file formats allow users of this mapper framework to design a wide range of characters—human and non-human—with the set of morph targets in the modified file format. The key flexibility is for the designers of the characters to sculpt both base shapes and corrective shapes which gives them complete control over the design and movement of the character.” and column 9, lines 23-26, “The strength of the user's expression and the predefined weights can be adjusted in accordance with certain embodiments of the invention.”)
Conclusion
36. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. These are as follows: Yumer (US-2017/0256098-A1) and Park (US-20230143019-A1) which both disclose the use of a plurality of weights to generate facial expressions.
37. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
38. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW CLOTHIER whose telephone number is (571)272-4667. The examiner can normally be reached Mon-Fri 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571)272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MATTHEW CLOTHIER/Examiner, Art Unit 2614
/KENT W CHANG/Supervisory Patent Examiner, Art Unit 2614