DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 12/09/2025 has been entered. Claims 1, 11, and 20 were amended. Claim 21 was added. Claims 1-17 and 19-21 are pending in the application.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-2, 6-12, 16-17, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng et al. (US 2023/0196617) in view of Vesdapunt et al. (US 2021/0358212), Baker et al. (US 2019/0295214), and Moser et al. (US 2024/0249460).
Regarding claim 1, Zheng teaches/suggests: A computer-implemented method for performing shape and appearance reconstruction, the computer-implemented method comprising:
a set of parameters that represent a reconstruction of the object in a first target image (Zheng [0023]-[0024] “The image(s) 202 may include, for example, a color image (e.g., an RGB image) and a depth image of the person ... Once the plurality of pose parameters θ and the plurality of shape parameters β have been determined, 3D human model 216 may be generated”),
generating a first set of corrections associated with at least a portion of the set of parameters based on input that includes the first target image (Zheng [0029] “Pose parameters (e.g., θ) 306 and/or shape parameters (e.g., β) 312 may then be adjusted”); and
generating an updated reconstruction of the object based on the first set of corrections (Zheng [0029] “an adjusted (e.g., more optimized) 3D human model 314 may be generated using the adjusted pose parameters 306 and/or the adjusted shape parameters 312”).
Zheng does not teach/suggest a neural network. Vesdapunt, however, teaches/suggests a neural network (Vesdapunt [0066] “The network 515 may output the corrective shape with the residual Δ.sub.S 520 in parallel with 3DMM parameters”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the parameters of Zheng to be adjusted with the network of Vesdapunt for machine learning.
Zheng as modified by Vesdapunt does not teach/suggest:
generating a first set of renderings associated with an object based on a set of parameters,
wherein the first set of renderings includes an albedo rendering;
generating, with a neural network, a first set of corrections associated with at least a portion of the set of parameters based on input that includes the first target image and the first set of renderings;
Baker, however, teaches/suggests:
generating a first set of renderings associated with an object based on a set of parameters (Baker [0029] “Multiple shading parameters may be selected for an object and corresponding shading parameter images may be precalculated for each parameter” [0007] “shading parameters are rendered into a deep image that is a series of images that may contain more data than just a color”),
wherein the first set of renderings includes an albedo rendering (Baker [0007] “Shading parameters might include factors such as a normal direction, an albedo color, or a specular color and power”);
Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the images of Zheng as modified by Vesdapunt to include the shading parameter images of Baker for more data. As such, Zheng as modified by Vesdapunt and Baker teaches/suggests:
generating, with a neural network, a first set of corrections associated with at least a portion of the set of parameters based on input that includes the first target image and the first set of renderings (Zheng [0029] “Pose parameters (e.g., θ) 306 and/or shape parameters (e.g., β) 312 may then be adjusted” Vesdapunt [0066] “The network 515 may output the corrective shape with the residual Δ.sub.S 520 in parallel with 3DMM parameters” Baker [0029] “Multiple shading parameters may be selected for an object and corresponding shading parameter images may be precalculated for each parameter”);
Zheng as modified by Vesdapunt and Baker does not teach/suggest:
wherein the set of parameters is respectively encoded into a set of latent space codes that are respectively input into one or more decoder neural networks,
Moser, in view of Baker, teaches/suggests:
wherein the set of parameters is respectively encoded into a set of latent space codes that are respectively input into one or more decoder neural networks (Baker [0029] “Multiple shading parameters may be selected for an object and corresponding shading parameter images may be precalculated for each parameter” Moser [0049] “Each autoencoder may comprise an encoder which, when trained, is capable of converting images exhibiting facial expressions to corresponding latent codes. Each autoencoder may also comprise a decoder which, when trained, is capable of converting latent codes to corresponding images exhibiting facial expressions”),
Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the shading parameter images of Zheng as modified by Vesdapunt and Baker to be encoded as taught/suggested by Moser for latent codes.
Regarding claim 2, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The computer-implemented method of claim 1, further comprising training the neural network based on a training dataset that includes a set of target corrections (Vesdapunt [0067]-[0068] “The loss function 570 represents a difference between the ground truth masked image 565 and the output from the ReDA rasterization pipeline 555 … for training the elements of the pipeline”). The ground truth masked image meets the target corrections. The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 6, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The computer-implemented method of claim 1, wherein the first set of corrections comprises a set of offsets to a set of coordinates included in the first set of renderings (Zheng [0029] “A difference 304 between the inferred body keypoints 302 and the originally extracted body keypoints (e.g., body keypoints 206 of FIG. 2) may be calculated. Similarly, a difference 310 between the inferred normal map 308 and the original normal map used to construct the 3D human model may also be calculated” Baker [0029] “Multiple shading parameters may be selected for an object and corresponding shading parameter images may be precalculated for each parameter”). The differences meet the offsets. The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 7, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The computer-implemented method of claim 6, wherein the set of coordinates comprises at least one of a spatial coordinate or a texture coordinate (Zheng [0029] “A difference 304 between the inferred body keypoints 302 and the originally extracted body keypoints (e.g., body keypoints 206 of FIG. 2) may be calculated. Similarly, a difference 310 between the inferred normal map 308 and the original normal map used to construct the 3D human model may also be calculated”).
Regarding claim 8, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The computer-implemented method of claim 1, wherein the first set of renderings further comprises at least one of a vertex coordinate rendering, a texture coordinate rendering, or a surface normal rendering (Baker [0007] “Shading parameters might include factors such as a normal direction, an albedo color, or a specular color and power”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 9, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The computer-implemented method of claim 1, further comprising generating the set of parameters based on a loss between the first target image and a rendered image that is generated based on the set of parameters (Vesdapunt [0066]-[0067] “The network 515 may output the corrective shape with the residual Δ.sub.S 520 in parallel with 3DMM parameters … The loss function 570 represents a difference between the ground truth masked image 565 and the output from the ReDA rasterization pipeline 555” Baker [0029] “Multiple shading parameters may be selected for an object and corresponding shading parameter images may be precalculated for each parameter”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Regarding claim 10, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The computer-implemented method of claim 1, wherein the set of parameters comprises at least one of an identity parameter, an expression parameter, a geometry parameter, an albedo parameter, a pose parameter, or a lighting parameter (Zheng [0024] “Once the plurality of pose parameters θ and the plurality of shape parameters β have been determined, 3D human model 216 may be generated” Baker [0007] “Shading parameters might include factors such as a normal direction, an albedo color, or a specular color and power”). The same rationale to combine as set forth in the rejection of claim 1 above is incorporated herein.
Claims 11 and 16 recite limitation(s) similar in scope to those of claims 1 and 6, respectively, and are rejected for the same reason(s). Zheng as modified by Vesdapunt, Baker, and Moser further teaches/suggests one or more non-transitory computer readable media storing instructions (Zheng Fig. 7: memory 706).
Regarding claim 12, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The one or more non-transitory computer readable media of claim 11, wherein the instructions further cause the one or more processors to perform a step of training the neural network based on an
Zheng, Vesdapunt, Baker, and Moser are silent regarding an L1 loss and a gradient loss between a first gradient computed from the first set of corrections and a second gradient computed from the corresponding set of target corrections. However, the concept and advantages of an L1 loss and a gradient loss are well known and expected in the art (Official Notice). It would have been obvious for the loss function of Vesdapunt to include such losses for the machine learning.
Regarding claim 17, Zheng as modified by Vesdapunt, Baker, and Moser teaches/suggests: The one or more non-transitory computer readable media of claim 16, wherein the set of coordinates comprises at least one of a spatial coordinate associated with a geometry for the object or a texture coordinate associated with a texture for the object (Zheng [0029] “A difference 304 between the inferred body keypoints 302 and the originally extracted body keypoints (e.g., body keypoints 206 of FIG. 2) may be calculated. Similarly, a difference 310 between the inferred normal map 308 and the original normal map used to construct the 3D human model may also be calculated”).
Regarding claim 19, Zheng, Vesdapunt, Baker, and Moser are silent regarding: The one or more non-transitory computer readable media of claim 11, wherein the neural network comprises a convolutional encoder that performs downsampling of a first set of feature maps associated with the first target image and the first set of renderings and a convolutional decoder that performs upsampling of a second set of feature maps associated with target image and the first set of renderings. However, the concept and advantages of a convolutional encoder and a convolutional decoder are well known and expected in the art (Official Notice). It would have been obvious for the neural network of Vesdapunt to include the convolutional encoder and decoder for the machine learning.
Claim 20 recites limitation(s) similar in scope to those of claim 1, and is rejected for the same reason(s). Zheng as modified by Vesdapunt, Baker, and Moser further teaches/suggests one or more memories that store instructions, and one or more processors that are coupled to the one or more memories (Zheng Fig. 7: processors 702 and memory 706).
Claim(s) 3-4 and 13-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng et al. (US 2023/0196617) in view of Vesdapunt et al. (US 2021/0358212), Baker et al. (US 2019/0295214), and Moser et al. (US 2024/0249460) as applied to claims 3 and 13 above, and further in view of Moriuchi et al. (US 2013/0182143).
Regarding claim 3, Zheng as modified by Vesdapunt, Baker, and Moser does not teach/suggest: The computer-implemented method of claim 1, further comprising:
producing, via the neural network, a second set of corrections associated with the at least a portion of the set of parameters based on a second target image of the object and a second set of renderings associated with the object,
wherein generating the updated reconstruction is further based on an aggregation of the first set of corrections and the second set of corrections.
Moriuchi, however, teaches/suggests a second set of corrections (Moriuchi [0041] “a color of this frame is corrected by using an average of correction values of other frames”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the adjusted parameters (the first set of corrections) of Zheng as modified by Vesdapunt, Baker, and Moser to include a second set of adjusted parameters as taught/suggested by Moriuchi for averaging. As such, Zheng as modified by Vesdapunt, Baker, Moser, and Moriuchi teaches/suggests:
producing, via the neural network, a second set of corrections associated with the at least a portion of the set of parameters based on a second target image of the object and a second set of renderings associated with the object (Zheng [0029] “Pose parameters (e.g., θ) 306 and/or shape parameters (e.g., β) 312 may then be adjusted” Vesdapunt [0066] “The network 515 may output the corrective shape with the residual Δ.sub.S 520 in parallel with 3DMM parameters” Baker [0029] “Multiple shading parameters may be selected for an object and corresponding shading parameter images may be precalculated for each parameter” Moriuchi [0041] “a color of this frame is corrected by using an average of correction values of other frames”),
wherein generating the updated reconstruction is further based on an aggregation of the first set of corrections and the second set of corrections (Zheng [0029] “an adjusted (e.g., more optimized) 3D human model 314 may be generated using the adjusted pose parameters 306 and/or the adjusted shape parameters 312” Moriuchi [0041] “a color of this frame is corrected by using an average of correction values of other frames”).
Regarding claim 4, Zheng as modified by Vesdapunt, Baker, Moser, and Moriuchi teaches/suggests: The computer-implemented method of claim 3, wherein the aggregation comprises a weighted combination of the first set of corrections and the second set of corrections, and wherein the weighted combination is generated based on a first set of visibilities associated with the first set of corrections and a second set of visibilities associated with the second set of corrections (Moriuchi [0080] “The color correction unit 342 calculates an weighted average (.alpha. blend value) of a pixel value A(i,j) of a pixel (i,j) (of each frame) of the first color correction result and a pixel value B(i,j) of a pixel (i,j) (of each frame) of the second color correction result”). The same rationale to combine as set forth in the rejection of claim 3 above is incorporated herein.
Claims 13 and 14 recite limitation(s) similar in scope to those of claims 3 and 4, respectively, and are rejected for the same reason(s)
Claim(s) 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng et al. (US 2023/0196617) in view of Vesdapunt et al. (US 2021/0358212), Baker et al. (US 2019/0295214), and Moser et al. (US 2024/0249460) as applied to claims 1 and 11 above, and further in view of Yeh et al. (US 2012/0229463).
Regarding claim 5, Zheng as modified by Vesdapunt, Baker, and Moser does not teach/suggest: The computer-implemented method of claim 1, wherein generating the updated reconstruction comprises:
converting the first set of corrections into a second set of corrections in a canonical space associated with the updated reconstruction; and
generating the updated reconstruction in the canonical space based on the second set of corrections.
Yeh, however, teaches/suggests converting in a canonical space (Yeh [0067] “When a 3D image is synthesized, each object is read from a database and converted to a unified world coordinate space 42 ... the world coordinate space 42 is converted to a view coordinate space 43 ... the view coordinate space 43 is converted to the 3D screen coordinate space 44”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the adjusted parameters (the first set of corrections) of Zheng as modified by Vesdapunt, Baker, and Moser to be converted as taught/suggested by Yeh for synthesis. As such, Zheng as modified by Vesdapunt, Baker, Moser, and Yeh teaches/suggests:
converting the first set of corrections into a second set of corrections in a canonical space associated with the updated reconstruction (Zheng [0029] “Pose parameters (e.g., θ) 306 and/or shape parameters (e.g., β) 312 may then be adjusted” Yeh [0067] “When a 3D image is synthesized, each object is read from a database and converted to a unified world coordinate space 42 ... the world coordinate space 42 is converted to a view coordinate space 43 ... the view coordinate space 43 is converted to the 3D screen coordinate space 44”); and
generating the updated reconstruction in the canonical space based on the second set of corrections (Zheng [0029] “an adjusted (e.g., more optimized) 3D human model 314 may be generated using the adjusted pose parameters 306 and/or the adjusted shape parameters 312” Yeh [0067] “When a 3D image is synthesized, each object is read from a database and converted to a unified world coordinate space 42 ... the world coordinate space 42 is converted to a view coordinate space 43 ... the view coordinate space 43 is converted to the 3D screen coordinate space 44”).
Claim 15 recites limitation(s) similar in scope to those of claim 5, and is rejected for the same reason(s).
Claim(s) 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zheng et al. (US 2023/0196617) in view of Vesdapunt et al. (US 2021/0358212), Baker et al. (US 2019/0295214), and Moser et al. (US 2024/0249460) as applied to claim 1 above, and further in view of Li (US 2006/0017720).
Regarding claim 21, Zheng further discloses in [0029]: “a set of body keypoints 302 of a person and a normal map 308 associated with a body surface of the person may be inferred from the 3D human model (e.g., consistent with the viewpoint of the sensing device or camera used to generate the 3D human model).” Zheng, Vesdapunt, Baker, and Moser are silent regarding: The computer-implemented method of claim 1, wherein the set of parameters includes first parameters that do not change among the first set of renderings and includes second parameters that do change among the first set of renderings. Li, however, teaches/suggests first parameters that do not change among the first set of renderings and includes second parameters that do change among the first set of renderings (Li [0070] “It is assumed for present purposes that the intrinsic parameters such as the focal lengths, scale factors, distortion coefficients will remain unchanged whereas the extrinsic parameters of the positions and orientations between the camera and projector have to be determined during the run-time of the system”). Before the effective filing date of the claimed invention, it would have been obvious for one of ordinary skill in the art to modify the shading parameter images of Zheng as modified by Vesdapunt, Baker, and Moser to include the camera parameters of Li for the modeling.
Response to Arguments
Applicant's arguments filed 12/09/2025 have been fully considered but they are moot in view of the new ground(s) of rejection set forth in this Office action.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 2013/0129190 – refine coarse normal map
US 2018/0068178 – facial expression
US 2020/0013212 – albedo/camera parameters
US 2022/0358719 – expression encoder/decoder
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANH-TUAN V NGUYEN whose telephone number is 571-270-7513. The examiner can normally be reached on M-F 9AM-5PM ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, JASON CHAN can be reached on 571-272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANH-TUAN V NGUYEN/
Primary Examiner, Art Unit 2619