DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Applicant’s election without traverse of Invention I, claims 1-24 and 27-28, in the reply filed on 12/29/2025 is acknowledged.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-6, 8-9, 13-15, 17, 22, 24, and 27-28 is/are rejected under 35 U.S.C. 102(a)(1)/102(a)(2) as being anticipated by Mallet et al1 (“Mallet”).
Regarding claim 1, Mallet teaches a method for generating training data in a form of a plurality of frames of facial animation, each of the plurality of frames represented as a three-dimensional (3D) mesh comprising a plurality of vertices, the training data usable to train an actor-specific actor-to-mesh conversion model which, when trained, receives a performance of the actor captured by a head-mounted camera (HMC) set-up and infers a corresponding actor- specific 3D mesh of the performance of the actor (note that the “training data” is not recited as actually used to train any model but rather it must be merely “usable” to train any type of “actor-specific actor-to-mesh conversion model which, when trained, receives a performance of the actor captured by a head-mounted camera (HMC) set-up and infers a corresponding actor- specific 3D mesh of the performance of the actor” such that if data is generated “in a form of a plurality of frames of facial animation, each of the plurality of frames represented as a three-dimensional (3D) mesh comprising a plurality of vertices” then such data could be used to train an actor-to-mesh conversion model as this would be usable with the mesh aspect of such a model; see Mallet, column 4, lines 28-44 and figure 1 teaching “from images of facial expressions collected by a camera, an array of cameras or other types of collection devices (e.g., an optical scanner), or through other techniques (e.g., electronically drawn, etc.), position information may be computed from the expressions and processed to produce an animation model (e.g. an adjustable three dimensional numerical mesh) that mathematically defines the shape of the underlying asset. Once produced, the model can be used for various applications” such that this produces “nearly identical representations of the collected facial expressions (used to produce the model)” such that here data is generated in the form of frames of facial animation with each of the frames represented as a 3D mesh which comprises a plurality of vertices and such data is usable in any application that could make use of such a mesh which would include an actor-specific actor-to-mesh conversion model as recited as such a mesh could be used for example for the mesh aspect of such a conversion model as it could provide a plausible representation of a captured expression; see also column 5, lines 3-20 and figure 1 teaching the images are acquired from “an array of cameras (e.g., represented by the head-mounted pair of cameras 104, 106) that can be used to capture images (e.g., from two distinct perspectives) of the actor's face and provide data that represents the imagery to the computer system” such that this also teaches that the 3D mesh model is usable in such a model as it provides an accurate representation of an actor performance captured by a helmet mounted camera thus providing a ground truth representation of a corresponding captured expression such that this means the mesh is usable as training data for such a conversion model that could utilize a ground truth 3D mesh representation of a captured object ), the method comprising:
receiving, as input, an actor range of motion (ROM) performance captured by a HMC set-up, the HMC-captured ROM performance comprising a number of frames of high resolution image data, each frame captured by a plurality of cameras to provide a corresponding plurality of images for each frame (see Mallet, column 4, lines 30-67 through column 5, lines 1-20 and figure 1, teaching “animation system 100 is presented that captures facial expressions of an actor and uses the captured information to efficiently produce an animation model that is capable of representing the captured expressions (along with other expressions)” and “animation system 100 is capable of capturing imagery (e.g., facial expressions of a performing actor) and creating one or more animation models for a variety of applications” where these images of different expressions comprise an HMC-captured ROM performance are acquired from “an array of cameras (e.g., represented by the head-mounted pair of cameras 104, 106) that can be used to capture images (e.g., from two distinct perspectives) of the actor's face and provide data that represents the imagery to the computer system”);
receiving or generating an approximate actor-specific ROM of a 3D mesh topology comprising a plurality of vertices, the approximate actor-specific ROM comprising a number of frames of the 3D mesh topology, each frame specifying the 3D positions of the plurality of vertices (note that “approximate…ROM of a 3D mesh topology” is interpreted as any ROM of a 3D mesh that can be considered close to the actual in some manner and for example if further processing is done on a ROM then this would imply that it is approximate as it can be made more accurate or be closer to the actual; further note that “actor-specific” is interpreted to mean that the ROM of a 3D mesh topology is specific to the actor in some way such as through being assigned to an actor, or used in connection with an actor, or which is based on the actor, or which describes the actor in any manner; see Mallet, column 5, lines 42-67 through column 6, lines 1-52 and figure 2 teaching “shapes (referred to as blendshapes) have geometries that may be adjusted (e.g., weighted) so the model to is able to represent a particular facial expression (included in the actor's performance) from a range of expressions” and “animation system 100 produces the animation model 108 (both shown in FIG. 1) from input information that includes the captured imagery (e.g., frames from video streams, etc.) of the actor's facial performance along with a deformable mesh of vertices that represents a base expression of the actor's face (e.g., a neutral facial expression) and a set of blendshapes for changing the facial expression being represented” where this “deformable mesh of vertices that represents a base expression of the actor’s face…and a set of blendshapes for changing the facial expression being represented” is an received approximate actor-specific ROM of a 3D mesh topology where these blendshapes specify 3D positions of vertices of the mesh such that when the 3D mesh is utilized for matching to the captured imagery then the 3D mesh frame that is adjusted to match each frame of the captured imagery are the number of frames of the 3D mesh topology);
performing a blendshape decomposition of the approximate actor-specific ROM to yield a blendshape basis or a plurality of blendshapes (see Mallet, column 6, lines 23-67 and figure 2 teaching the actor-specific ROM supplied is subjected to a blendshape decomposition where “input information that includes the captured imagery (e.g., frames from video streams, etc.) of the actor's facial performance along with a deformable mesh of vertices that represents a base expression of the actor's face (e.g., a neutral facial expression) and a set of blendshapes for changing the facial expression being represented” such that this provides the actor-specific ROM and as in column 8, lines 3-28 “For each image, frame of video, etc., 2D marker constrains and 3D bundle constraints, including the constraints for the tracked curves, are used to execute an optimization calculation. In general, the energy E can be considered the squared L2 error of a stack of constraint vectors. Initially, an optimization calculation is executed to solve for blendshape weights w that substantially match the tracked markers and curves. Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” such that here in order to match the mesh to the captured images a “blendshape basis” is provided where the mesh and matrix describing the blendshape parameters are considered decomposed into this blendshape basis as such decomposition allows to compare the blendshape portions to the tracked imagery from the HMC);
performing a blendshape optimization to obtain a blendshape-optimized 3D mesh, the blendshape optimization comprising determining, for each frame of the HMC- captured ROM performance, a vector of blendshape weights and a plurality of transformation parameters which, when applied to the blendshape basis to reconstruct the 3D mesh topology, minimize a blendshape optimization loss function which attributes loss to differences between the reconstructed 3D mesh topology and the frame of the HMC-captured ROM performance (see Mallet, column 8, lines 3-50 teaching “For each image, frame of video, etc., 2D marker constrains and 3D bundle constraints, including the constraints for the tracked curves, are used to execute an optimization calculation” where “the energy E can be considered the squared L2 error of a stack of constraint vectors. Initially, an optimization calculation is executed to solve for blendshape weights w that substantially match the tracked markers and curves. Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” and “two energies are fitting the 2D markers and fitting the 2D curves respectively, with the appropriate weights ω. Since the constraints are linear to the blendshape weights, the calculation is solved, for example, using quadratic programming” and “By executing the calculation, a deformed shape, X̃, is produced in the blendshape subspace” such that this deformed shape after the optimization is a blendshape-optimized 3D mesh where the optimization determines for each frame a vector of blendshape weights and the transformation parameters which minimize the optimization loss function utilizing differences between the reconstructed 3D mesh topology and the frame of the HMC performance being compared/matched);
performing a mesh-deformation refinement on the blendshape-optimized 3D mesh to obtain a mesh-deformation-optimized 3D mesh, the mesh-deformation refinement comprising determining, for each frame of the HMC-captured ROM performance, 3D locations of a plurality of handle vertices which, when applied to the blendshape- optimized 3D mesh using a mesh-deformation technique, minimize a mesh-deformation refinement loss function which attributes loss to differences between the deformed 3D mesh topology and the HMC-captured ROM performance (see Mallet, column 8, lines 26-67 through column 9, lines 1-5 teaching that following the above blendshape optimization there is a secondary optimization process that performs a mesh-deformation refinement on the blendshape-optimized 3D mesh found above where “improved the match of the constraints may be achieved by implementing one or more techniques. For example, improvement may be provided by solving for one or more corrective shapes, ΔX, without inducing noise or over-fitting. In general, such an optimization may include five energy terms. In addition to the two previous terms represented in equation (3), 3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint as regularization to prevent arbitrary changes to the geometry may be introduced” where for example “equation 4” corresponds to such refinement and for example “the mesh provides a reasonable match to facial expression of the actor, the mesh does not substantially track the lips and eye lids of the actor. In FIG. 2(c), defined contour information is included in calculation for producing the animation model. As represented in the figure, the mesh overlaid upon the actor's facial expression provides a closer match in the eye lids. Compared to the underlying expression, the mouth region of the mesh appears to slightly deviate from the actor's mouth” and “Comparing the mesh and the underlying image, incorporating the corrective shapes into the animation model adjusts the mouth region of the mesh (e.g., including the inner lip shape and corner of the mouth) to provide a closer match to the facial expression” such that here using the 3D locations of a plurality of handle vertices corresponding to the vertices of the problematic areas such as “eyelids” and “inner lip” and “corner of the mouth” the system is able to refine the initial mesh to get a more accurate reconstruction);
generating the training data based on the mesh-deformation-optimized 3D mesh (note that see Mallet, column 4, lines 28-52 teaching “from images of facial expressions collected by a camera, an array of cameras or other types of collection devices (e.g., an optical scanner), or through other techniques (e.g., electronically drawn, etc.), position information may be computed from the expressions and processed to produce an animation model (e.g. an adjustable three dimensional numerical mesh) that mathematically defines the shape of the underlying asset. Once produced, the model can be used for various applications” where this “animation model” which is an “adjustable three dimensional numerical mesh” is output to any application that can utilize such a mesh and again as explained above this mesh data is training data as recited as it is usable for training a system that utilizes a mesh given its format).
Regarding claim 2, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein the blendshape optimization loss function comprises a likelihood term that attributes: relatively high loss to vectors of blendshape weights which, when applied to the blendshape basis to reconstruct the 3D mesh topology, result in reconstructed 3D meshes that are relatively less feasible based on the approximate actor-specific ROM; and relatively low loss to vectors of blendshape weights which, when applied to the blendshape basis to reconstruct the 3D mesh topology, result in reconstructed 3D meshes that are relatively more feasible based on the approximate actor-specific ROM (see Mallet, column 8, lines 3-25 where “Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” in accordance with equation 3 where “As provided by equation (3), two energies are fitting the 2D markers and fitting the 2D curves respectively, with the appropriate weights ω. Since the constraints are linear to the blendshape weights, the calculation is solved, for example, using quadratic programming” where the energy function and “energy E” which is “the squared L2 error of a stack of constraint vectors” is used in the “optimization calculation” to “solve for blendshape weights w that substantially match the tracked markers and curves” where E(w) serves as a likelihood term such that when a vector of blendshape weights produces a mesh that deviates significantly from the tracked Rom markers, the Energy function calculates a larger error value (relatively high loss) where this means that the mesh fails to align with the captured ROM and is thus a less feasible representation of the performance, and when weights produce a mesh that closely aligns with the tracked ROM, E is minimized and thus attributed a lower loss which indicates this is a more feasible representation of the performance).
Regarding claim 3, Mallet teaches all that is required as applied to claim 2 above and further teaches wherein, for each vector of blendshape weights, the likelihood term is based on a negative log-likelihood of locations of a subset of vertices reconstructed using the vector of blendshape weights relative to locations of vertices of the approximate actor-specific ROM (see Mallet, column 8, lines 3-25 where “Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” in accordance with equation 3 where “As provided by equation (3), two energies are fitting the 2D markers and fitting the 2D curves respectively, with the appropriate weights ω. Since the constraints are linear to the blendshape weights, the calculation is solved, for example, using quadratic programming” where the energy function and “energy E” which is “the squared L2 error of a stack of constraint vectors” is used in the “optimization calculation” to “solve for blendshape weights w that substantially match the tracked markers and curves” where E(w) serves as a likelihood term as explained above and this may be considered to be based on a negative log-likelihood of locations of a subset of vertices reconstructed using the vector of blendshape weights relative to locations of vertices of the approximate actor-specific ROM because the energy term is based on the spatial positions of iterated mesh vertices relative to the input data and when using a quadratic solver as taught this minimizes the squared distance between points such that this is mathematically equivalent to finding a negative log-likelihood of such locations).
Regarding claim 4, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein the blendshape optimization comprises, for each of a plurality of frames of the HMC-captured ROM performance, starting the blendshape optimization process using a vector of blendshape weights and a plurality of transformation parameters previously optimized for a preceding frame of the HMC- captured ROM performance (see Mallet, column 9, lines 16-34 teaching the blendshape weights and transformation parameters dictating how the mesh is affected by the blendshapes generated as “produced animation models” such that a vector of blendshape weights and transformation parameters were previously optimized and can be updated such that for frames which utilize the updated model this means they start their optimization using a vector and transformation parameters that were optimized for a preceding frame as the next frame will be optimized to the blendshapes and mesh through using these updated parameters that were previously optimized prior to the edit ).
Regarding claim 5, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein performing the mesh-deformation refinement comprises determining, for each frame of the HMC-captured ROM performance, 3D locations of the plurality of handle vertices which, when applied to the blendshape- optimized 3D mesh using the mesh-deformation technique for successive pluralities of N frames of the HMC-captured ROM performance, minimize the mesh-deformation refinement loss function (note that “handle vertices” are interpreted as vertices related to a functional “handle” element where a handle provides some aspect of control over the element it is associated with such that handle vertices provide some aspect of control; see Mallet, column 7, lines 15-63 teaching “contour information is provided for facial features represented in the image. For example, contour 206 provides an outline of upper and lower lids of the actor's right eye, and, contour 208 outlines the silhouette of the inner lips of the actor, which emanates from the corner of the mouth and extends to the center of the actor's mouth” and “contours may be manually defined by an animator, editor, etc. or may be automatically defined by one or more processes” and “one or more quantities that represent a correspondence between a contour and one or more mesh edges may be defined and used by the computer system for model production” where for example “As show in FIG. 3b , a silhouette contour 308 (also referred to as an occluding contour) is defined as an edge that represents a boundary between the visual and hidden portion of the actor's mouth. Since the geometry of the actor's lips typically changes with time, such curves and silhouette contours change from image to image (e.g., from frame to frame in a video) and are defined for each image. After a silhouette contour has been defined and a curve selected, a correspondence between the two can be defined. In one arrangement, a correspondence may be defined by projecting the vertices of the silhouette contour 308 and aligning end-points of the contour with end-points of the selected curve 306” such that here these silhouette contours which are determined and tracked frame to frame are 3D locations of handle vertices through correspondence “defined by projecting the vertices of the silhouette contour 308 and aligning end-points of the contour with end-points of the selected curve 306” and as in column 8, lines 26-50 the corrective shapes optimization includes “3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint as regularization to prevent arbitrary changes to the geometry” where these constraints correspond to the tracked locationsf that must be matched such that these boundary vertices function as handle vertices that control deformation of the mesh and when applied for each successive frame these 3D locations are tracked in order to minimize the mesh-deformation refinement loss function as in equation 4).
Regarding claim 6, Mallet teaches all that is required as applied to claim 5 above and further teaches wherein the mesh-deformation refinement loss function attributes loss to differences between the deformed 3D mesh topology and the HMC- captured ROM performance over each successive plurality of N frames (see Mallet, column 8, lines 3-50 as explained above where the mesh-deformation refinement loss function as in equation 4 attributes loss to difference between the deformed 3D mesh topology as in the initial deformation and the captured performance being tracked of each successive frame as this loss is minimized for each frame for each handle vertices location to refine the mesh where “For each image, frame of video, etc., 2D marker constrains and 3D bundle constraints, including the constraints for the tracked curves, are used to execute an optimization calculation. In general, the energy E can be considered the squared L2 error of a stack of constraint vectors” and “improvement may be provided by solving for one or more corrective shapes, ΔX, without inducing noise or over-fitting. In general, such an optimization may include five energy terms. In addition to the two previous terms represented in equation (3), 3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint as regularization”).
Regarding claim 8, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein performing the mesh-deformation refinement comprises, for each frame of the HMC-captured ROM performance, starting with 3D locations of the plurality of handle vertices from the blendshape-optimized 3D mesh (see Mallet, column 8, lines 3-50 teaching “For each image, frame of video, etc., 2D marker constrains and 3D bundle constraints, including the constraints for the tracked curves, are used to execute an optimization calculation. In general, the energy E can be considered the squared L2 error of a stack of constraint vectors. Initially, an optimization calculation is executed to solve for blendshape weights w that substantially match the tracked markers and curves. Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” and “improved the match of the constraints may be achieved by implementing one or more techniques. For example, improvement may be provided by solving for one or more corrective shapes, ΔX, without inducing noise or over-fitting. In general, such an optimization may include five energy terms. In addition to the two previous terms represented in equation (3), 3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint as regularization” such that as explained above the handle vertices correspond to the vertices of the silhouette corrective shape contours and thus the “3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint” correspond to the handle vertices which are controlled to be deformed to refine the vertices of the corrective shapes).
Regarding claim 9, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein the mesh deformation technique comprises at least one of: a Laplacian mesh deformation, a bi-Laplacian mesh deformation, and a combination of the Laplacian mesh deformation and the bi-Laplacian mesh deformation (see Mallet, column 8, lines 3-50 teaching Laplacian mesh deformation where ““For each image, frame of video, etc., 2D marker constrains and 3D bundle constraints, including the constraints for the tracked curves, are used to execute an optimization calculation. In general, the energy E can be considered the squared L2 error of a stack of constraint vectors. Initially, an optimization calculation is executed to solve for blendshape weights w that substantially match the tracked markers and curves. Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” and “improved the match of the constraints may be achieved by implementing one or more techniques. For example, improvement may be provided by solving for one or more corrective shapes, ΔX, without inducing noise or over-fitting. In general, such an optimization may include five energy terms. In addition to the two previous terms represented in equation (3), 3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint as regularization””).
Regarding claim 13, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein generating the training data based on the mesh- deformation-optimized 3D mesh comprises: receiving user input; modifying one or more frames of the mesh-deformation-optimized 3D mesh based on the user input to thereby provide an iteration output 3D mesh; generating the training data based on the iteration output 3D mesh (see Mallet, column 9, lines 16-34, teaching “along with providing closely matching representations to an actor face, the produced animation models provide for other functionality such as editability. As illustrated in the figure, three images 400, 402, 404 are presented of an actor's rig that tracks the performed facial expression of an actor. Image 400 presents an initial facial expression of the actor in which both eyes are in open positions along with the actor's mouth. By allowing blendshapes associated with the animation model to be adjusted, the presented facial expression may be edited. For example, as shown in image 402, adjustments may be executed for placing the eye lids in a closed position and slightly raisin the upper lip. In image 404, other adjustments may be added such as raising both eyebrows to allow and animator, editor, etc. to produce a slightly different emotion. Once edited, the adjustments may be used for updating the animation model such that the newly adjusted facial expressions may be reconstructed by applying the appropriate weight or weights to the blendshapes associated with the animation model” such that here user input is received to edit the mesh which has already been optimized by the iterative process that produced it and this modifies the one or more frames that have already been put through the iterative process which provides an iteration output mesh in the form of the updated animation model which is the generated training data).
Regarding claim 14, Mallet teaches all that is required as applied to claim 13 above and further teaches wherein the user input is indicative of a modification to one or more initial frames of the mesh-deformation-optimized 3D mesh and wherein modifying the one or more frames of the mesh-deformation-optimized 3D mesh based on the user input comprises (see Mallet, column 9, lines 16-36 and figure 4 teaching “along with providing closely matching representations to an actor face, the produced animation models provide for other functionality such as editability. As illustrated in the figure, three images 400, 402, 404 are presented of an actor's rig that tracks the performed facial expression of an actor. Image 400 presents an initial facial expression of the actor in which both eyes are in open positions along with the actor's mouth. By allowing blendshapes associated with the animation model to be adjusted, the presented facial expression may be edited. For example, as shown in image 402, adjustments may be executed for placing the eye lids in a closed position and slightly raisin the upper lip. In image 404, other adjustments may be added such as raising both eyebrows to allow and animator, editor, etc. to produce a slightly different emotion. Once edited, the adjustments may be used for updating the animation model such that the newly adjusted facial expressions may be reconstructed by applying the appropriate weight or weights to the blendshapes associated with the animation model” such that here the user input is indicative of a modification to one or more frames that the user would like to edit): propagating the modification from the one or more initial frames to one or more further frames of the mesh-deformation-optimized 3D mesh to provide the iteration output 3D mesh (see Mallet, column 9, lines 16-36 teaching “along with providing closely matching representations to an actor face, the produced animation models provide for other functionality such as editability. As illustrated in the figure, three images 400, 402, 404 are presented of an actor's rig that tracks the performed facial expression of an actor. Image 400 presents an initial facial expression of the actor in which both eyes are in open positions along with the actor's mouth. By allowing blendshapes associated with the animation model to be adjusted, the presented facial expression may be edited. For example, as shown in image 402, adjustments may be executed for placing the eye lids in a closed position and slightly raisin the upper lip. In image 404, other adjustments may be added such as raising both eyebrows to allow and animator, editor, etc. to produce a slightly different emotion. Once edited, the adjustments may be used for updating the animation model such that the newly adjusted facial expressions may be reconstructed by applying the appropriate weight or weights to the blendshapes associated with the animation model” such that there once edited, “the adjustments may be used for updating the animation model such that the newly adjusted facial expressions may be reconstructed by applying the appropriate weight or weights to the blendshapes associated with the animation model” which means that the model and blendshape parameters themselves have been adjusted so that further frames of “newly adjusted facial expressions may be reconstructed” which propagates through the adjustment to the blendshape and the application of the blendshape to the target as in column 8, lines 3-50 where such blendshapes would now be used in the optimization along with the tracked input anytime the blendshape is triggered, and for example based on the adjustment of a frame such as 402 of figure 4, the frame 402 adjustments to the blendshape update the model such that based on the optimization process when matching the next frame of performance of the actor using the updated model, the further frames utilize the same adjusted blendshapes but weighted based on the new tracked positions of corresponding features and contours such that this provides the further frames of an iteration output 3D mesh).
Regarding claim 15, Mallet teaches all that is required as applied to claim 14 above and further teaches wherein propagating the modification from the one or more initial frames to the one or more further frames comprises implementing a weighted pose-space deformation (WPSD) process (note that a weighted pose-space deformation is considered to be any deformation process that is performed in relation to some pose-space and which utilizes weights in some manner where if a deformation occurs based on some aspect of pose-space such that for example if an adjustment to a pose-space causes a deformation in pose-space or causes deformation in another space then this is a pose-space deformation; see Mallet, column 9, lines 16-34 teaching as above “allowing blendshapes associated with the animation model to be adjusted” where “adjustments may be executed for placing the eye lids in a closed position and slightly raising the upper lip” and “once edited the adjustments may be used for updating the animation model such that the newly adjusted facial expressions may be reconstructed by applying the appropriate weight or weights to the blendshapes associated with the animation model” such that this propagates the adjustment to the model and further frames through the updated blendshape as when the optimization process is run the new corrective shape will be applied based on the pose/expression of the captured performance which uses the weights from the optimization of the blendshape to the capture such that this change in pose-space causes the deformation of the mesh to be optimized using the propagated adjustments to the blendshapes of the model).
Regarding claim 17, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein the blendshape optimization loss function comprises a depth term that, for each frame of the HMC-captured ROM performance, attributes loss to differences between depths determined on a basis of the reconstructed 3D mesh topology and depths determined on a basis of the HMC-captured ROM performance (see Mallet, column 7, lines 4-67 through column 8, lines 1-50 teaching the blendshape optimization loss function as in equations 3 and 4, in view of equation 2 where the depth term relates to “markers that can be viewed by both cameras…for estimating the three-dimensional positions” using a “bundle adjustment technique” where as in the optimizations “For each image, frame of video, etc., 2D marker constrains and 3D bundle constraints, including the constraints for the tracked curves, are used to execute an optimization calculation. In general, the energy E can be considered the squared L2 error of a stack of constraint vectors. Initially, an optimization calculation is executed to solve for blendshape weights w that substantially match the tracked markers and curves. Given a neutral mesh b0 and the blendshape basis as a matrix B, the calculation attempts to fit the deformed mesh X(w)=b0+Bw to the input features” such that here this fitting attributes loss to differences between depths of the reconstructed mesh features with the tracked 3D markers which attempts to minimize this loss ).
Regarding claim 22, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein the mesh-deformation refinement loss function comprises a depth term that, for each frame of the HMC-captured ROM performance, attributes loss to differences between depths determined on a basis of the 3D locations of the plurality of handle vertices applied to the blendshape-optimized 3D mesh using the mesh-deformation technique and depths determined on a basis of the HMC-captured ROM performance (see Mallet, column 7, lines 4-14 teaching utilizing depth information such as “three dimensional positions” and “estimating the positions of these markers along with a constraint to fit a bundle” where as in column 8, lines 3-50 this depth information is part of the optimization process and mesh refinement optimziation such that it attributes loss to differences between depths determined on a basis of the 3D locations of the handle vertices applied to refine the blendshapes and depths determined based on the performance as the loss between these 3D tracked positions are compared to the blendshapes applied to apply the optimal blendshape parameters).
Regarding claim 24, Mallet teaches all that is required as applied to claim 1 above and further teaches wherein the mesh-deformation refinement loss function comprises a displacement term which, for each frame of the HMC-captured ROM performance, comprises a per-vertex parameter which expresses a degree of confidence in the vertex positions of the blendshape-optimized 3D mesh (see Mallet, column 8, lines 3-50 teaching that “An improved the match of the constraints may be achieved by implementing one or more techniques. For example, improvement may be provided by solving for one or more corrective shapes, ΔX, without inducing noise or over-fitting. In general, such an optimization may include five energy terms. In addition to the two previous terms represented in equation (3), 3D point constraints for the mesh boundary, 3D curves (e.g., for the eyelids), and a cotangent weighted Laplacian constraint as regularization to prevent arbitrary changes to the geometry may be introduced” where here in this refinement stage the corrective shapes are solved for which relate to per-vertex parameters tied to the corrective shapes and thus the optimization of this parameter with regard to its loss expresses the degree of confidence in the vertex positions where the system will attempt to lower this loss as this expresses a confidence that the vertex is not in the proper position and must be optimized).
Regarding claim 27, Mallet teaches an apparatus comprising a processor configured (e.g. by suitable programming) to perform the method of claim 1 (see Mallet, column 12, lines 29-61 teaching “computing device 900…that can be used to implement the techniques described for producing an animation model” which “includes a processor”) where this apparatus performs the method of claim 1 as explained in the rejection of claim 1.
Regarding claim 28, Mallet teaches a computer program product comprising a non-transitory medium which carries a set of computer-readable instructions which, when executed by a data processor, cause the data processor to execute the method of claim 1 (see Mallet, column 12, lines 29-61 teaching “computing device 900…that can be used to implement the techniques described for producing an animation model” which “includes a processor” and “processor 902 can process instructions for execution within the computing device 900, including instructions stored in the memory 904”) where the execution in relation to the apparatus is for a method as in claim 1 as explained in the rejection of claim 1 above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 7, 12 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mallet in view of Li et al2 (“Li”).
Regarding claim 7, Mallet teaches all that is required as applied to claim 5 above but fails to teach wherein determining, for each frame of the HMC- captured ROM performance, 3D locations of the plurality of handle vertices comprises,for each successive plurality of N frames of the HMC-captured ROM performance, using an estimate of 3D locations of the plurality of handle vertices from a frame of the of the HMC-captured ROM performance that precedes the current plurality of N frames of the HMC-captured ROM performance to determine at least part of the mesh-deformation refinement loss function. Rather while Mallet determines the 3D locations over a successive plurality of frames, there is no teaching of using an estimate of 3D locations of the plurality of handle vertices from a frame of the of the HMC-captured ROM performance that precedes the current plurality of N frames of the HMC-captured ROM performance, as instead Mallet determines the deformation in each frame from a neutral pose of the mesh. Thus Mallet stands as a base device upon which the claimed invention can be seen as an improvement through utilization of such 3D location estimates using information from preceding frames to guide the deformation which could allow for more realistic or smoother deformations over the sequence as the deformations that optimize the loss functions would not need to be as great when the mesh is not being deformed from a neutral expression each time.
In the same field of endeavor relating to optimizing and refining blendshape mesh models representing a user associated with ROM captures of the user, Li teaches that it is known to determine for a captured ROM performance, 3D locations of the plurality of handle vertices where for each successive plurality of N frames of the HMC-captured ROM performance, using an estimate of 3D locations of the plurality of handle vertices from a frame of the of the HMC-captured ROM performance that precedes the current plurality of N frames of the HMC-captured ROM performance to determine at least part of the mesh-deformation refinement loss function (see Li, paragraphs 0041-0059 teaching “process 300 may be iteratively performed on a frame-by-frame basis, and may iteratively improve accuracy of the tracking output as each frame is processed. After each iteration, a tracked 3D model may be produced for each frame” and “To track the actor's face, rigid motion tracking may be performed at 306 in order to rigidly align the tracked 3D model of a previous frame (e.g., an immediately previous frame) to the current input frame” such that here 3D locations of a plurality of handle vertices are estimated from a previously captured frame such that they are used as an estimate of the starting point for the blendshape fitting 308 and optimization 310 where “the fitting from step 308 may be refined using a deformation algorithm (e.g., a Laplacian deformation algorithm) with the same 3D point constraints (the one or more depth maps) and 2D facial feature constraints on the input scans that were used during step 308 in order to establish reliable correspondences between the tracked model and the input scan” such that here the refinement that is performed uses this information to determine the refinement loss function as it utilizes the tracked model from the previous frame as the starting point for the optimizations that are performed where this tracked model comprises the 3D locations of the handle vertices which are further deformed in the refinement step such as by the Laplacian fitting). Thus Li teaches a known technique applicable to the base device of Mallet.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Mallet to incorporate the known techniques of Li above as doing so would be no more than application of a known technique to a base device which is ready for improvement where the results of such a combination would be predictable and would result in an improved system. The predictable result of the combination of Li’s technique with Mallet would be that handle vertex positions corresponding to the boundary point constraints in Mallet would be determined for a preceding frame and would be used to inform the loss function for the current frame or group of frames as the fitting of the blendshape and optimization of the blendshape mesh would utilize the tracked model compared to the input frame to determine deformations over the sequence instead of comparing the input frame to a neutral model each time. This would result in an improved system as this could improve the temporal coherence of the deformations and in the situations in which the input frame deformation is large compared to the neutral frame then this could result in faster optimization as the initial solutions would be closer to the current input.
Regarding claim 12, Mallet teaches all that is required as applied to claim 1 above but fails to teach wherein generating the training data based on the mesh- deformation-optimized 3D mesh comprises performing at least one additional iteration of the steps of: performing the blendshape decomposition; performing the blendshape optimization; performing the mesh-deformation refinement; and generating the training data; using the mesh-deformation-optimized 3D mesh from the preceding iteration of these steps as an input in place of the approximate actor-specific ROM. Rather Mallet does not teach use of previously optimized output meshes from a preceding iteration of a generation step in place of the actor-specific ROM, as instead Mallet determines the deformation in each frame from a neutral pose of the mesh. Thus Mallet stands as a base device upon which the claimed invention can be seen as an improvement through utilization of such information from preceding frames to guide the deformation which could allow for more realistic or smoother deformations over the sequence as the deformations that optimize the loss functions would not need to be as great when the mesh is not being deformed from a neutral expression each time.
In the same field of endeavor relating to optimizing and refining blendshape mesh models representing a user associated with ROM captures of the user, Li teaches that it is known that in generating mesh data in relation to a captured ROM performance which performs blendshape decomposition, blendshape optimization, mesh-deformation refinement, and mesh output, that the generating of the output data based on the mesh-deformation -optimized 3D mesh comprises at least one additional iteration of the steps of performing the blendshape decomposition; performing the blendshape optimization; performing the mesh-deformation refinement; and generating the training data; and using the mesh-deformation-optimized 3D mesh from the preceding iteration of these steps as an input in place of the approximate actor-specific ROM. (see Li, paragraphs 0041-0059 teaching “process 300 may be iteratively performed on a frame-by-frame basis, and may iteratively improve accuracy of the tracking output as each frame is processed. After each iteration, a tracked 3D model may be produced for each frame” and “To track the actor's face, rigid motion tracking may be performed at 306 in order to rigidly align the tracked 3D model of a previous frame (e.g., an immediately previous frame) to the current input frame” such that here iterations of the steps are performed as the input scan changes and “the fitting from step 308 may be refined using a deformation algorithm (e.g., a Laplacian deformation algorithm) with the same 3D point constraints (the one or more depth maps) and 2D facial feature constraints on the input scans that were used during step 308 in order to establish reliable correspondences between the tracked model and the input scan” such that here the steps are iterated through using the new scan and the previously generated model is used in place of the actor-specific ROM for the rigid alignment which then sets the basis for the blendshape optimization and mesh deformation refinement). Thus Li teaches a known technique applicable to the base device of Mallet.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Mallet to incorporate the known techniques of Li above as doing so would be no more than application of a known technique to a base device which is ready for improvement where the results of such a combination would be predictable and would result in an improved system. The predictable result of the combination of Li’s technique with Mallet would be that instead of using the neutral expression model to determine deformation and mesh optimization as in Mallet, the tracked output of each frame would be used to establish an alignment to a new input scan which would be used instead of the actor-specific ROM. This would result in an improved system as this could improve the temporal coherence of the deformations and in the situations in which the input frame deformation is large compared to the neutral frame then this could result in faster optimization as the initial solutions would be closer to the current input.
Regarding claim 16, Mallet teaches all that is required as applied to claim 13 above but fails to teach wherein generating the training data based on the iteration output 3D mesh comprises performing at least one additional iteration of the steps of: performing the blendshape decomposition; performing the blendshape optimization; performing the mesh-deformation refinement; and generating the training data; using the iteration output 3D mesh from the preceding iteration of these steps as an input in place of the approximate actor-specific ROM. Rather, while Mallet does teach iterating the steps and providing adjustments that can iterate the steps, the output of each iteration of the step does not use the output 3D mesh from the preceding iteration of these steps as an input in place of the approximate actor-specific ROM, instead while the model and adjustments are propagated through, in each case the actor-specific ROM is used and deformed based on the input neutral mesh and how to deform the mesh from this neutral basis. Thus Mallet stands as a base device upon which the claimed invention can be seen as an improvement through utilization of such information from preceding frames to guide the deformation which could allow for more realistic or smoother deformations over the sequence as the deformations that optimize the loss functions would not need to be as great when the mesh is not being deformed from a neutral expression each time.
In the same field of endeavor relating to optimizing and refining blendshape mesh models representing a user associated with ROM captures of the user, Li teaches that it is known that in generating mesh data in relation to a captured ROM performance which performs blendshape decomposition, blendshape optimization, mesh-deformation refinement, and mesh output, that the generating of the output data based on the mesh-deformation -optimized 3D mesh comprises at least one additional iteration of the steps of performing the blendshape decomposition; performing the blendshape optimization; performing the mesh-deformation refinement; and generating the training data; and using the mesh-deformation-optimized 3D mesh from the preceding iteration of these steps as an input in place of the approximate actor-specific ROM. (see Li, paragraphs 0041-0059 teaching “process 300 may be iteratively performed on a frame-by-frame basis, and may iteratively improve accuracy of the tracking output as each frame is processed. After each iteration, a tracked 3D model may be produced for each frame” and “To track the actor's face, rigid motion tracking may be performed at 306 in order to rigidly align the tracked 3D model of a previous frame (e.g., an immediately previous frame) to the current input frame” such that here iterations of the steps are performed as the input scan changes and “the fitting from step 308 may be refined using a deformation algorithm (e.g., a Laplacian deformation algorithm) with the same 3D point constraints (the one or more depth maps) and 2D facial feature constraints on the input scans that were used during step 308 in order to establish reliable correspondences between the tracked model and the input scan” such that here the steps are iterated through using the new scan and the previously generated model is used in place of the actor-specific ROM for the rigid alignment which then sets the basis for the blendshape optimization and mesh deformation refinement). Thus Li teaches a known technique applicable to the base device of Mallet.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Mallet to incorporate the known techniques of Li above as doing so would be no more than application of a known technique to a base device which is ready for improvement where the results of such a combination would be predictable and would result in an improved system. The predictable result of the combination of Li’s technique with Mallet would be that instead of using the neutral expression model to determine deformation and mesh optimization as in Mallet, the tracked output of each frame would be used to establish an alignment to a new input scan which would be used instead of the actor-specific ROM. This process would be compatible with accepting the user edits to change the facial expression as these changes would update the model and then the optimization would utilize the iteration output mesh as the input in place of the actor-specific ROM. This would result in an improved system as this could improve the temporal coherence of the deformations and in the situations in which the input frame deformation is large compared to the neutral frame then this could result in faster optimization as the initial solutions would be closer to the current input.
Claim(s) 18 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mallet in view of Grabli et al3 (“Grabli”).
Regarding claim 18, Mallet teaches all that is required as applied to claim 1 above but fails to teach wherein the blendshape optimization loss function comprises an optical flow term that, for each frame of the HMC-captured ROM performance, attributes loss to differences between: optical loss determined on a basis of HMC-captured ROM performance for the current frame and at least one preceding frame; and displacement of the vertices of the reconstructed 3D mesh topology between the current frame and the at least one preceding frame. Rather Mallet does not use any optical flow term in the blendshape optimization loss function which uses such current and preceding frames and displacement of vertices between such frames. Thus Mallet stands as a base device upon which the claimed invention can be seen as improvement through such use of an optical flow term that when used in an optimization loss function would allow the optimization to take into account temporal and image based deformations leading to more coherent and accurate reconstructions.
In the same field of endeavor relating to capturing a ROM performance of a user and optimizing parameters of a blendshape model and mesh representing the user to match the performance, Grabli teaches that it is known to perform a blendshape optimization to match a captured performance to a blendshape based mesh model (see Grabli, paragraphs 0030-0034 teaching that the system works on “deformable models” that “produces a facial expression mesh M by combining linearly a set of m three-dimensional blendshapes” where blendshape weights determine the deformation of vertices of the mesh and such models are those which the captured ROM performance frames or “plates” are matched to as in paragraphs 0042-50 teaching transferring the captured performance and expressions to the mesh using a blendshape optimization loss function where “Method 500 can match facial expressions of an actor captured during a performance (e.g., step 320) to facial expressions of a computer-generated model of the actor. Method 500 can be performed on each and every plate in a sequence of video so that the facial expressions of a computer-generated model of the actor matches the facial expressions of the actor throughout the entire video sequence” and “method 500 can start with various inputs including a plate from the performance capture session (block 502) and an initial facial mesh (block 504) representing a neutral geometry of a deformable model generated, for example, as described above with respect to FIG. 3, step 310. The initial facial mesh (i.e., initial deformable model) can include the rigid adjustment (rotation and translation), the blend shape weights and the per-vertex deltas for the deformable model that define the neutral geometry. A differentiable renderer (block 506) can render the initial facial mesh and then method 500 can solve the deformation from the plate (block 510) by trying to minimize the differences between the initial deformable model (i.e., neutral expression) and the actor's actual facial expression in the plate using a recipe (i.e., a sequence of deformation solvers as discussed below) based on various inputs as described below over a series of n iterations. Thus, the solver in block 510 calculates an expression of the deformable model that is closest to the expression of the actor in the plate” such that here this “trying to minimize differences” or the loss between the data, using “solvers” to determine the blendshape weights corresponds to blendshape optimization to match a captured performance until the loss is acceptable for example) and such optimization includes a blendshape optimization loss function that comprises an optical flow term that, for each frame of the HMC-captured ROM performance, attributes loss to differences between: optical loss determined on a basis of HMC-captured ROM performance for the current frame and at least one preceding frame; and displacement of the vertices of the reconstructed 3D mesh topology between the current frame and the at least one preceding frame (see Grabli, paragraphs 0045-0050 teaching that the transfer process above relies on various inputs chosen by the system designer based on the situation including a facial rig and mesh and blendshape model of the actor/performer and markers and features that can be tracked in the image in comparison to the model as in “solving the deformation in block 510, embodiments of the invention can use some or all of the following inputs (block 502) in addition to the footage of the actor whose facial expressions are being captured (i.e., the plate also in block 502)” and “Some of the above inputs can be generated from data processed on a per-shot (i.e., a continuous sequence of frames of digital film) basis as opposed to a per-plate basis” and as in paragraphs 0072-0094, “block 510 can try to minimize the differences between the deformable model and the actor's facial expression in the plate using multiple approaches or “solvers” based on the inputs from block 502. Each solver can execute one or more iterations (e.g., n iterations). The types of solvers used in block 510 and the number of iterations that each solver is executed can be chosen prior to implementing method 500. For example, in some embodiments, particular solvers can include between 10-15 iterations but embodiments of the invention are not limited to any particular number of iterations, however, and a user can choose more or fewer iterations as is deemed appropriate for each solver to reach an acceptable level of matching between the deformable model generated as the final facial mesh (block 520) and the original plate” and the “solvers” utilize the various “cost functions” that “can be added to a solver and the solver will optimizes its parameters in order to minimize the cost defined by the cost functions” and one of the cost functions that can be added to the solvers is the “deformation solver” that “can be used in block 510 to solve for all or some of the parameters of the deformable model described above, i.e. the rigid adjustment (rotation and translation), the blend shape weights and the per-vertex deltas” where such solvers suggested to be used include those with a cost function that comprises an optical flow term that for each frame of the captured performance, attributes loss to difference between optical loss determined on a basis of HMC-captured ROM performance for the current frame and at least one preceding frame; and displacement of the vertices of the reconstructed 3D mesh topology between the current frame and the at least one preceding frame such as when using “Frame-to-frame dense optical flow: Given the deformable model at a given frame, we create an embedding for each pixel of that frame on the mesh (using the virtual camera to perform inverse projection). Given a dense set of per-pixel optical flow vectors between this frame and another one, the cost function computes the difference between the projection of each embedding and its 2D target as defined by the optical flow vector” and “Render-to-plate dense optical flow: Given the deformable model at a given frame (as well as the material, the light rig, the camera and the plate), this cost function produces a render of that model and computes a dense set of optical flow vectors between this render and the plate. Similarly to the previous cost function, this cost function then creates embeddings by inverse-projecting the image pixels onto the model and computes the distance between the projection of these embeddings and their target locations as defined by the optical flow vectors defined above” where for example these cost functions can be combined with any other cost functions to optimize the deformation parameters using any recipe chosen by the designer where for example a recipe could include “Solve for the blendshape weights and rigid head transform which minimize the error defined by the following cost functions: keyline-to-spline difference, marker-to-embedding difference, render-to-plate dense optical flow (also using a weights prior, a rigid prior and a Laplacian prior)” and “Solve for the blendshape weights which minimize the error defined by the following cost functions: render-to-plate difference, keyline-to-spline difference, marker-to-embedding difference (along with a weights prior and a Laplacian prior)” or “For missing details and to improve temporal coherence, embodiments can pick “key frames” out of the current result (i.e., frames that are a good match) and use the frame-to-frame dense flow cost function to drive deformation capturing the missing details” such that here frame to frame optical flow would thus be used in the blendshape optimization using the loss from taking into account the current and preceding frames and the deformations that take place as a result). Thus Grabli teaches known techniques applicable to the base system of Mallet.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Mallet to include the known techniques of Grabli as doing so would be no more than application of a known technique to a base system ready for improvement, where the results of the combination would be predictable and would result in an improved system. The result of the combination would predictably be that the optimization solver in Mallet that already takes into account multiple inputs for optimizing a blendshape, would be further modified similarly to creating a specific recipe of solvers as in Grabli where the optical flow loss terms of Grabli would be used as further optimization inputs to obtain the optimized blendshape weights and parameters. This would result in an improved system as the ability to utilize such solvers would allow to “provide the solver with a very dense set of pixels in each iteration” so that “the solver can produce a more detailed solution for the performance compared to solutions calculated by traditional marker-based systems that are limited in the detail they capture by the number of markers being tracked” as suggested by Grabli (see Grabli, paragraph 0044).
Regarding claim 23, Mallet teaches all that is required as applied to claim 1 above but fails to teach the mesh-deformation refinement loss function comprises an optical flow term that, for each frame of the HMC-captured ROM performance, attributes loss to differences between: optical loss determined on a basis of HMC-captured ROM performance for the current frame and at least one preceding frame; and displacement of the vertices determined on a basis of the 3D locations of the plurality of handle vertices applied to the blendshape-optimized 3D mesh using the mesh- deformation technique for the current frame and the at least one preceding frame. Rather Mallet does not use any optical flow term in the blendshape optimization loss function which uses such current and preceding frames and displacement of vertices between such frames. Thus Mallet stands as a base device upon which the claimed invention can be seen as improvement through such use of an optical flow term that when used in an optimization loss function would allow the optimization to take into account temporal and image based deformations leading to more coherent and accurate reconstructions.
In the same field of endeavor relating to capturing a ROM performance of a user and optimizing parameters of a blendshape model and mesh representing the user to match the performance, Grabli teaches that it is known to perform a blendshape and mesh optimization to match a captured performance to a blendshape based mesh model (see Grabli, paragraphs 0030-0034 teaching that the system works on “deformable models” that “produces a facial expression mesh M by combining linearly a set of m three-dimensional blendshapes” where blendshape weights determine the deformation of vertices of the mesh and such models are those which the captured ROM performance frames or “plates” are matched to as in paragraphs 0042-50 teaching transferring the captured performance and expressions to the mesh using a blendshape optimization loss function where “Method 500 can match facial expressions of an actor captured during a performance (e.g., step 320) to facial expressions of a computer-generated model of the actor. Method 500 can be performed on each and every plate in a sequence of video so that the facial expressions of a computer-generated model of the actor matches the facial expressions of the actor throughout the entire video sequence” and “method 500 can start with various inputs including a plate from the performance capture session (block 502) and an initial facial mesh (block 504) representing a neutral geometry of a deformable model generated, for example, as described above with respect to FIG. 3, step 310. The initial facial mesh (i.e., initial deformable model) can include the rigid adjustment (rotation and translation), the blend shape weights and the per-vertex deltas for the deformable model that define the neutral geometry. A differentiable renderer (block 506) can render the initial facial mesh and then method 500 can solve the deformation from the plate (block 510) by trying to minimize the differences between the initial deformable model (i.e., neutral expression) and the actor's actual facial expression in the plate using a recipe (i.e., a sequence of deformation solvers as discussed below) based on various inputs as described below over a series of n iterations. Thus, the solver in block 510 calculates an expression of the deformable model that is closest to the expression of the actor in the plate” such that here this “trying to minimize differences” or the loss between the data, using “solvers” to determine the blendshape weights corresponds to blendshape optimization to match a captured performance until the loss is acceptable for example) and such optimization also performs mesh-deformation refinement to match pixels to the mesh vertices related to the blendshapes where this refinement comprises a loss function that comprises an optical flow term that, for each frame of the HMC-captured ROM performance, attributes loss to differences between: optical loss determined on a basis of HMC-captured ROM performance for the current frame and at least one preceding frame; and displacement of the vertices determined on a basis of the 3D locations of the plurality of handle vertices applied to the blendshape-optimized 3D mesh using the mesh- deformation technique for the current frame and the at least one preceding frame (see Grabli, paragraphs 0045-0050 teaching that the transfer process above relies on various inputs chosen by the system designer based on the situation including a facial rig and mesh and blendshape model of the actor/performer and markers and features that can be tracked in the image in comparison to the model as in “solving the deformation in block 510, embodiments of the invention can use some or all of the following inputs (block 502) in addition to the footage of the actor whose facial expressions are being captured (i.e., the plate also in block 502)” and “Some of the above inputs can be generated from data processed on a per-shot (i.e., a continuous sequence of frames of digital film) basis as opposed to a per-plate basis” and as in paragraphs 0072-0094, “block 510 can try to minimize the differences between the deformable model and the actor's facial expression in the plate using multiple approaches or “solvers” based on the inputs from block 502. Each solver can execute one or more iterations (e.g., n iterations). The types of solvers used in block 510 and the number of iterations that each solver is executed can be chosen prior to implementing method 500. For example, in some embodiments, particular solvers can include between 10-15 iterations but embodiments of the invention are not limited to any particular number of iterations, however, and a user can choose more or fewer iterations as is deemed appropriate for each solver to reach an acceptable level of matching between the deformable model generated as the final facial mesh (block 520) and the original plate” and the “solvers” utilize the various “cost functions” that “can be added to a solver and the solver will optimizes its parameters in order to minimize the cost defined by the cost functions” and one of the cost functions that can be added to the solvers is the “deformation solver” that “can be used in block 510 to solve for all or some of the parameters of the deformable model described above, i.e. the rigid adjustment (rotation and translation), the blend shape weights and the per-vertex deltas” where such solvers suggested to be used include those with a cost function that comprises an optical flow term that for each frame of the captured performance, attributes loss to difference between optical loss determined on a basis of HMC-captured ROM performance for the current frame and at least one preceding frame; and displacement of the vertices of the reconstructed 3D mesh topology between the current frame and the at least one preceding frame such as when using “Frame-to-frame dense optical flow: Given the deformable model at a given frame, we create an embedding for each pixel of that frame on the mesh (using the virtual camera to perform inverse projection). Given a dense set of per-pixel optical flow vectors between this frame and another one, the cost function computes the difference between the projection of each embedding and its 2D target as defined by the optical flow vector” and “Render-to-plate dense optical flow: Given the deformable model at a given frame (as well as the material, the light rig, the camera and the plate), this cost function produces a render of that model and computes a dense set of optical flow vectors between this render and the plate. Similarly to the previous cost function, this cost function then creates embeddings by inverse-projecting the image pixels onto the model and computes the distance between the projection of these embeddings and their target locations as defined by the optical flow vectors defined above” where for example these cost functions can be combined with any other cost functions to optimize the deformation parameters using any recipe chosen by the designer where for example a recipe could include “Solve for the blendshape weights and rigid head transform which minimize the error defined by the following cost functions: keyline-to-spline difference, marker-to-embedding difference, render-to-plate dense optical flow (also using a weights prior, a rigid prior and a Laplacian prior)” and “Solve for the blendshape weights which minimize the error defined by the following cost functions: render-to-plate difference, keyline-to-spline difference, marker-to-embedding difference (along with a weights prior and a Laplacian prior)” or “For missing details and to improve temporal coherence, embodiments can pick “key frames” out of the current result (i.e., frames that are a good match) and use the frame-to-frame dense flow cost function to drive deformation capturing the missing details” such that here frame to frame optical flow would thus be used in the refining the mesh over these iterations to match the performance to the blendshape and mesh related to the blendshape). Thus Grabli teaches known techniques applicable to the base system of Mallet.
Therefore it would have been obvious for one of ordinary skill in the art before the effective filing date of the invention to modify Mallet to include the known techniques of Grabli as doing so would be no more than application of a known technique to a base system ready for improvement, where the results of the combination would be predictable and would result in an improved system. The result of the combination would predictably be that the optimization solver in Mallet that already takes into account multiple inputs for optimizing a blendshape, would be further modified similarly to creating a specific recipe of solvers as in Grabli where the optical flow loss terms of Grabli would be used as further optimization inputs to obtain the optimized blendshape weights and parameters, and such solver could be added to the refinement stage as in Mallet as this would simply be using the cost function in a type of recipe as already suggested by Grabli. This would result in an improved system as the ability to utilize such solvers would allow to “provide the solver with a very dense set of pixels in each iteration” so that “the solver can produce a more detailed solution for the performance compared to solutions calculated by traditional marker-based systems that are limited in the detail they capture by the number of markers being tracked” as suggested by Grabli (see Grabli, paragraph 0044).
Allowable Subject Matter
Claims 10-11 and 19-21 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Regarding claim 10, the instant claim requires the mesh deformation technique comprises a linear combination of the Laplacian mesh deformation and the bi-Laplacian mesh deformation, whereas its parent claim 9 required only one of the Laplacian or bi-Laplacian mesh deformation. As explained above, Mallet teaches use of a Laplacian mesh deformation, but does not teach use of a bi-laplacian mesh deformation nor a linear combination of both types of deformations. Laplacian and bi-laplacian mesh deformation used in a similar context is found in the prior art such as in Jin et al (US PGPUB No. 2013/0124148, see paragraphs 0099-0128), however in Jin, only the bi-laplacian form of the Laplacian mesh deformation is utilized, and there is not teaching or suggestion to have both a Laplacian and bi-Laplacian deformation performed and linearly combined and used as in the claims. The Examiner is unable to find any teaching or suggestion in the prior art which teaches such limitations in a manner that would render the claims obvious or anticipated. Thus the claim contains allowable subject matter. Note that claim 11 is considered allowable at least based on its dependence on claim 10 which contains allowable subject matter.
Regarding claim 19, the instant claim depends from claim 17 and further requires wherein determining, for each frame of the HMC- captured ROM performance, the vector of blendshape weights and the plurality of transformation parameters which, when applied to the blendshape basis to reconstruct the 3D mesh topology, minimize the blendshape optimization loss function comprises: starting by holding the vector of blendshape weights constant and optimizing the plurality of transformation parameters to minimize the blendshape optimization loss function to determine an interim plurality of transformation parameters; and after determining the interim plurality of transformation parameters, allowing the vector of blendshape weights to vary and optimizing the vector of blendshape weights and the plurality of transformation parameters to minimize the blendshape optimization loss function to determine the optimized vector of blendshape weights and plurality of transformation parameters. Mallet teaches various stages of minimizing the blendshaped optimization and refining it, but does not detail any interim plurality of transform parameters that are determined specifically after holding certain weights constant previously and then using such interim transform parameters. Thus the claims define over the teachings of Mallet. The Examiner is unable to find any teaching or suggestion in the prior art of such a technique applied in the same context in such a manner that the claims would be rendered obvious or anticipated. Thus the claim contains allowable subject matter.
Regarding claim 20, the instant claim depends from claim 17 and further requires wherein determining, for each frame of the HMC- captured ROM performance, the vector of blendshape weights and the plurality of transformation parameters which, when applied to the blendshape basis to reconstruct the 3D mesh topology, minimize the blendshape optimization loss function comprises: starting by holding the vector of blendshape weights constant and optimizing the plurality of transformation parameters to minimize the blendshape optimization loss function to determine an interim plurality of transformation parameters; and after determining the interim plurality of transformation parameters, allowing the vector of blendshape weights to vary and optimizing the vector of blendshape weights and the plurality of transformation parameters to minimize the blendshape optimization loss function to determine an interim vector of blendshape weights and a further interim plurality of transformation parameters; after determining the interim vector of blendshape weights and further interim plurality of transformation parameters, introducing a 2-dimensional (2D) constraint term to the blendshape optimization loss function to obtain a modified blendshape optimization loss function and optimizing the vector of blendshape weights and the plurality of transformation parameters to minimize the modified blendshape optimization loss function to determine the optimized vector of blendshape weights and plurality of transformation parameters. Similarly to the previous claim, Mallet teaches various stages of minimizing the blendshaped optimization and refining it, but does not detail any interim plurality of transform parameters that are determined specifically after holding certain weights constant previously and then using such interim transform parameters. Thus the claims define over the teachings of Mallet. The Examiner is unable to find any teaching or suggestion in the prior art of such a technique applied in the same context in such a manner that the claims would be rendered obvious or anticipated. Thus the claim contains allowable subject matter. Note that claim 21 is dependent from claim 20 and is allowable at least for being dependent upon an allowable claim.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT E SONNERS whose telephone number is (571)270-7504. The examiner can normally be reached Mon-Friday 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached at (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SCOTT E SONNERS/Examiner, Art Unit 2613
/XIAO M WU/Supervisory Patent Examiner, Art Unit 2613
1 US Patent No. 9747716
2 US Patent No. 20150084950
3 US PGPUB No. 20200286284