DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 05/02/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered and attached by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3 and 9-15 are rejected under 35 U.S.C. 103 as being unpatentable over Wedig (U.S. Patent Pub. No. 2019/0362529) in view of Ma (U.S. Patent Pub. No. 2018/0033190).
Regarding Claim 1, Wedig teaches a method comprising:
identifying baseline blendshapes from a captured facial image of a facial expression of a user (¶42 the wearable system 200 can use the outward-facing imaging system 464 or the inward-facing imaging system 462 to acquire images of a pose of the user. The images may be still images, frames of a video, or a video; ¶101 Some example characteristics of the avatar can include the size, appearance, position, orientation, movement, pose, expression, etc.)
for each of a plurality of facial expressions, identifying blendshape weights (¶140 In addition to skeletal systems, “blendshapes” can also be used in rigging to produce mesh deformations. A blendshape (sometimes also called a “morph target” or just a “shape”) is a deformation applied to a set of vertices in the mesh where each vertex in the set is moved a specified amount in a specified direction based upon a weight) from a captured facial image of the facial expression of the user, and generating a blendshape model by applying the blendshape weights to the baseline blendshapes (¶140 Each vertex in the set may have its outs custom motion for a specific blendshape, and moving the vertices in the set simultaneously will generate the desired shape. The custom motion for each vertex in a blendshape can be specified by a “delta,” which is a vector representing the amount and direction of XYZ motion applied to that vertex. Blendshapes can be used to produce, for example, facial deformations to move the eyes, lips, brows, nose, dimples, etc., just to name a few possibilities;)
for each facial expression, rendering an avatar from the blendshape model (¶104 The movement processing system 684 can be configured to animate the avatar, such as, e.g., by changing the avatar's pose, by moving the avatar around in a user's environment, or by animating the avatar's facial expressions, etc. As will further be described herein, the virtual avatar can be animated using rigging techniques) and simulating avatar training images from the avatar in correspondence with facial images capturable by a (¶164 The parameters that represent transformations of the joints of the skinning system (e.g., a weight map, rotations, and translations) can be extracted from training data that represents a series or sequence of target poses (e.g., facial expressions, body gestures, movements (e.g., sitting, walking, running, etc.)); ¶165 Training data comprises meshes that share a topology with a base neutral mesh used for animation of an avatar, but which have been deformed into the target poses used for skinning the avatar. Training data can represent a pose or a blendshape for an avatar.) head-mountable display (HMD); and (Fig. 2; ¶39 The display 220 can comprise a head mounted display (HMD) that is worn on the head of the user)
training a machine learning model based on the avatar training images for each facial expression (¶109 The object recognitions can additionally or alternatively be performed by a variety of machine learning algorithms. Once trained, the machine learning algorithm can be stored by the HMD; ¶149 The pose interpolation node can be implemented in a variety of ways, including with radial basis functions (RBFs). RBFs can perform a machine-learned mathematical approximation of a function. RBFs can be trained using a set of inputs and their associated expected outputs. The training data could be, for example, multiple sets of joint transforms (which define particular poses) and the corresponding blendshapes (or linear skins) to be applied in response to those poses.)
Wedig does not explicitly disclose identifying baseline blendshapes from a captured facial image of a neutral facial expression of a user.
Ma is in the same field of art of image analysis. Further, Ma teaches identifying baseline blendshapes from a captured facial image of a neutral facial expression of a user (¶80 Referring to FIG. 3, as a first step 301, an approximate neutral pose of the subject is manually selected from the tracked performance.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wedig by determining the pose of a neutral face that is taught by Ma; thus, one of ordinary skilled in the art would be motivated to combine the references so that the rigs produced for each actor behave consistently in the hands of an animator. (Ma ¶13).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 2, Wedig in view of Ma discloses the method of claim 1, further comprising: applying the machine learning model to the facial images captured by the HMD of a wearer exhibiting a facial expression to predict the blendshape weights for the facial expression of the wearer (Wedig, ¶109 The object recognitions can additionally or alternatively be performed by a variety of machine learning algorithms. Once trained, the machine learning algorithm can be stored by the HMD; ¶149 The pose interpolation node can be implemented in a variety of ways, including with radial basis functions (RBFs). RBFs can perform a machine-learned mathematical approximation of a function. RBFs can be trained using a set of inputs and their associated expected outputs. The training data could be, for example, multiple sets of joint transforms (which define particular poses) and the corresponding blendshapes (or linear skins) to be applied in response to those poses.)
Regarding Claim 3, Wedig in view of Ma discloses the method of claim 2, further comprising: retargeting the predicted blendshape weights for the facial expression of the wearer of the HMD onto an avatar corresponding to the wearer to render the avatar with the facial expression of the wearer; and displaying the rendered avatar corresponding to the wearer (Wedig, ¶104 The 3D model processing system 680 can be configured to animate and cause the display 220 to render a virtual avatar 670. The 3D model processing system 680 can include a virtual character processing system 682 and a movement processing system 684. The virtual character processing system 682 can be configured to generate and update a 3D model of a user (for creating and animating the virtual avatar). The movement processing system 684 can be configured to animate the avatar, such as, e.g., by changing the avatar's pose, by moving the avatar around in a user's environment, or by animating the avatar's facial expressions, etc.)
Regarding Claim 9, Wedig in view of Ma discloses the method of claim 1, wherein the avatar for each facial expression is rendering using first avatar rendering parameters, the method further comprising, for each facial expression:
rendering an additional avatar from the blendshape model using second avatar rendering parameters and simulating additional avatar training images from the additional avatar in correspondence with the facial images capturable by the HMD, wherein the machine learning model is further trained based on the additional avatar training images for each facial expression (Wedig teaches the processing of rendering an avatar and training based on the avatar images. Ma teaches performing this iteratively ¶37 repeating the optimization process for a predetermined number of iterations to yield a final set of weighted blendshapes; and using the final set of weighted blendshapes to render the animated blendshape.)
The reasons for combining Wedig and Ma are similar to that stated in the rejection of claim 1. In addition, this same reasoning is pertinent and applicable to the rejections of claim 10 below.
Regarding Claim 10, Wedig in view of Ma discloses the method of claim 1, further comprising: discarding each facial expression for which the blendshape weights are outliers compared to the blendshape weights for other of the facial expressions, such that the blendshape model is not generated, the avatar is not rendered, and the avatar training images are not simulated (Ma, ¶138 The GUI and command line provide the option to overwrite the status of a certain marker to be INVALIDATED for all the frames. Just as MISSED markers, the INVALIDATED markers are not considered when computing the 3D marker positions during the later stage of the pipeline.)
Regarding Claim 11, Wedig teaches a non-transitory computer-readable data storage medium storing program code executable by a processor to perform processing comprising (¶230 a hardware processor in communication with the non-transitory computer storage, the hardware processor programmed:)
capturing facial images of a wearer of a head-mountable display (HMD) using corresponding cameras of the HMD (Fig. 2; ¶39 The display 220 can comprise a head mounted display (HMD) that is worn on the head of the user; ¶42 the wearable system 200 can use the outward-facing imaging system 464 or the inward-facing imaging system 462 to acquire images of a pose of the user. The images may be still images, frames of a video, or a video; ¶101 Some example characteristics of the avatar can include the size, appearance, position, orientation, movement, pose, expression, etc.);
applying a machine learning model (¶109 The object recognitions can additionally or alternatively be performed by a variety of machine learning algorithms. Once trained, the machine learning algorithm can be stored by the HMD; ¶149 The pose interpolation node can be implemented in a variety of ways, including with radial basis functions (RBFs). RBFs can perform a machine-learned mathematical approximation of a function. RBFs can be trained using a set of inputs and their associated expected outputs. The training data could be, for example, multiple sets of joint transforms (which define particular poses) and the corresponding blendshapes (or linear skins) to be applied in response to those poses) to the captured facial images to predict blendshape weights for a facial expression of the wearer of the HMD exhibited within the captured facial images (¶140 In addition to skeletal systems, “blendshapes” can also be used in rigging to produce mesh deformations. A blendshape (sometimes also called a “morph target” or just a “shape”) is a deformation applied to a set of vertices in the mesh where each vertex in the set is moved a specified amount in a specified direction based upon a weight;)
retargeting the predicted blendshape weights for the facial expression of the wearer of the HMD onto an avatar corresponding to the wearer to render the avatar with the facial expression of the wearer; and directly or indirectly displaying the rendered avatar corresponding to the wearer, (Wedig, ¶104 The 3D model processing system 680 can be configured to animate and cause the display 220 to render a virtual avatar 670. The 3D model processing system 680 can include a virtual character processing system 682 and a movement processing system 684. The virtual character processing system 682 can be configured to generate and update a 3D model of a user (for creating and animating the virtual avatar). The movement processing system 684 can be configured to animate the avatar, such as, e.g., by changing the avatar's pose, by moving the avatar around in a user's environment, or by animating the avatar's facial expressions, etc.)
wherein the machine learning model is trained on simulated avatar training images of training avatars rendered from blendshape models corresponding to facial expressions and generated by applying blendshape weights identified from captured training facial images of the facial expressions to baseline blendshapes identified from a captured training facial image of a facial expression (¶109 The object recognitions can additionally or alternatively be performed by a variety of machine learning algorithms. Once trained, the machine learning algorithm can be stored by the HMD; ¶149 The pose interpolation node can be implemented in a variety of ways, including with radial basis functions (RBFs). RBFs can perform a machine-learned mathematical approximation of a function. RBFs can be trained using a set of inputs and their associated expected outputs. The training data could be, for example, multiple sets of joint transforms (which define particular poses) and the corresponding blendshapes (or linear skins) to be applied in response to those poses.)
Wedig does not explicitly disclose a captured facial image of a neutral facial expression.
Ma is in the same field of art of image analysis. Further, Ma teaches a captured facial image of a neutral facial expression (¶80 Referring to FIG. 3, as a first step 301, an approximate neutral pose of the subject is manually selected from the tracked performance)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wedig by determining the pose of a neutral face that is taught by Ma; thus, one of ordinary skilled in the art would be motivated to combine the references so that the rigs produced for each actor behave consistently in the hands of an animator. (Ma ¶13).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 12, Wedig in view of Ma discloses the non-transitory computer-readable data storage medium of The non-transitory computer-readable data storage medium of wherein the captured facial images of the wearer comprise captured left and right eye images of facial portions of the wearer respectively including left and right eyes of the wearer, and a captured mouth image of a lower facial portion of the wearer including a mouth of the wearer (Wedig, ¶41 The wearable system 200 can also include an inward-facing imaging system 462 (shown in FIG. 4) which can track the eye movements of the user. The inward-facing imaging system may track either one eye's movements or both eyes' movements… For example, at least one camera may be used to image each eye. The images acquired by the cameras may be used to determine pupil size or eye pose for each eye separately, thereby allowing presentation of image information to each eye to be dynamically tailored to that eye; ¶132 skin motions due to an avatar's facial contortions (which may represent expressions such as smiling, frowning, laughing, speaking, blinking, etc.) can be represented by a series of facial joints controlled by a facial rig) (Ma also teaches ¶target facial expression is at least one of a smile, a laugh, a frown, a growl, a yell, closed eyes, open eyes, heightened eyebrows, lowered eyebrows, pursed lips, a mouth shape of a vowel, and a mouth shape of a consonant.)
Regarding Claim 13, Wedig in view of Ma discloses the non-transitory computer-readable data storage medium of claim 12, wherein for each training avatar, the simulated avatar training images comprise simulated avatar left and right eye images in correspondence with the captured left and right eye images of the facial portions of the wearer respectively including the left and right eyes of the wearer, and a simulated avatar mouth image in correspondence with the captured mouth image of the lower portion of the wearer including the mouth of the wearer (Wedig, ¶41 teaches pose of eyes see claim 12; ¶132 teaches pose of mouth see claim 12; ¶165 Training data comprises meshes that share a topology with a base neutral mesh used for animation of an avatar, but which have been deformed into the target poses used for skinning the avatar. Training data can represent a pose or a blendshape for an avatar.)
Regarding Claim 14, Wedig teaches a head-mountable display (HMD) comprising (Fig. 2; ¶39 The display 220 can comprise a head mounted display (HMD) that is worn on the head of the user:)
cameras to capture facial images of a wearer of the HMD (¶42 the wearable system 200 can use the outward-facing imaging system 464 or the inward-facing imaging system 462 to acquire images of a pose of the user. The images may be still images, frames of a video, or a video;)
a processor; and (¶44 The local processing and data module 260 may comprise a hardware processor)
a memory storing program code executable by the processor (¶272 Code modules or any type of data may be stored on any type of non-transitory computer-readable medium, such as physical computer storage including hard drives, solid state memory, random access memory (RAM), read only memory (ROM), optical disc, volatile or non-volatile storage, combinations of the same and/or the like) to apply a machine learning model to the captured facial images to predict blendshape weights for a facial expression of the wearer of the HMD exhibited within the captured facial images (¶109 The object recognitions can additionally or alternatively be performed by a variety of machine learning algorithms. Once trained, the machine learning algorithm can be stored by the HMD; ¶149 The pose interpolation node can be implemented in a variety of ways, including with radial basis functions (RBFs). RBFs can perform a machine-learned mathematical approximation of a function. RBFs can be trained using a set of inputs and their associated expected outputs. The training data could be, for example, multiple sets of joint transforms (which define particular poses) and the corresponding blendshapes (or linear skins) to be applied in response to those poses,)
wherein the machine learning model is trained on simulated avatar training images of training avatars rendered from blendshape models corresponding to facial expressions and generated by applying blendshape weights identified from captured training facial images of the facial expressions to baseline blendshapes identified from a captured training facial image of a facial expression (¶109 The object recognitions can additionally or alternatively be performed by a variety of machine learning algorithms. Once trained, the machine learning algorithm can be stored by the HMD; ¶149 The pose interpolation node can be implemented in a variety of ways, including with radial basis functions (RBFs). RBFs can perform a machine-learned mathematical approximation of a function. RBFs can be trained using a set of inputs and their associated expected outputs. The training data could be, for example, multiple sets of joint transforms (which define particular poses) and the corresponding blendshapes (or linear skins) to be applied in response to those poses.)
Wedig does not explicitly disclose a captured training facial image of a facial expression.
Ma is in the same field of art of image analysis. Further, Ma teaches a captured training facial image of a facial expression (¶80 Referring to FIG. 3, as a first step 301, an approximate neutral pose of the subject is manually selected from the tracked performance)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Wedig by determining the pose of a neutral face that is taught by Ma; thus, one of ordinary skilled in the art would be motivated to combine the references so that the rigs produced for each actor behave consistently in the hands of an animator. (Ma ¶13).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.
Regarding Claim 15, Wedig in view of Ma discloses the HMD of claim 14, wherein the program code is executable by the processor to further:
retarget the predicted blendshape weights for the facial expression of the wearer of the HMD onto an avatar corresponding to the wearer to render the avatar with the facial expression of the wearer; and display the rendered avatar corresponding to the wearer on a display of the HMD, or transmit the rendered avatar corresponding to the wearer to a computing device to indirectly display the rendered avatar on a display of the computing device (Wedig, ¶104 The 3D model processing system 680 can be configured to animate and cause the display 220 to render a virtual avatar 670. The 3D model processing system 680 can include a virtual character processing system 682 and a movement processing system 684. The virtual character processing system 682 can be configured to generate and update a 3D model of a user (for creating and animating the virtual avatar). The movement processing system 684 can be configured to animate the avatar, such as, e.g., by changing the avatar's pose, by moving the avatar around in a user's environment, or by animating the avatar's facial expressions, etc.)
Allowable Subject Matter
Claims 4-8 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claim 4, No prior art teaches adding random noise to the baseline blendshapes to generate additional baseline blendshapes; and for each facial expression, generating an additional blendshape model by applying the blendshape weights to the additional baseline blendshapes, rendering an additional avatar from the additional blendshape model, and simulating additional avatar training images from the additional avatar in correspondence with the facial images capturable by the HMD, wherein the machine learning model is further trained based on the additional avatar training images for each facial expression.
Regarding claim 6, No prior art teaches adding random noise to the blendshape weights to generate additional blendshape weights, generating an additional blendshape model by applying the additional blendshape weights to the baseline blendshapes, rendering an additional avatar from the additional blendshape model, and simulating additional avatar training images from the additional avatar in correspondence with the facial images capturable by the HMD, wherein the machine learning model is further trained based on the additional avatar training images for each facial expression.
Regarding claim 8, No prior art teaches adding random noise to the baseline blendshapes to generate additional baseline blendshapes; for each facial expression, adding random noise to the blendshape weights to generate additional blendshape weights, generating an additional blendshape model by applying the additional blendshape weights to the additional baseline blendshapes, rendering an additional avatar from the additional blendshape model, and simulating additional avatar training images from the additional avatar in correspondence with the facial images capturable by the HMD,
wherein the machine learning model is further trained based on the additional avatar training images for each facial expression.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DUSTIN BILODEAU whose telephone number is (571)272-1032. The examiner can normally be reached 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DUSTIN BILODEAU/Examiner, Art Unit 2664
/JENNIFER MEHMOOD/Supervisory Patent Examiner, Art Unit 2664