Last updated: April 19, 2026
Application No. 18/669,132
METHOD AND APPARATUS FOR THREE-DIMENSIONAL HUMAN-BODY MODEL ESTIMATION AND REFINEMENT

Final Rejection §102§103
Filed
May 20, 2024
Examiner
SUN, HAI TAO
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
2 (Final)
Interview Optional

— +26.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 476 resolved cases, 2023–2026
Examiner Intelligence

SUN, HAI TAO View full profile →
Grants 73% — above average
Career Allow Rate
347 granted / 476 resolved
+10.9% vs TC avg
Strong +27% interview lift
Without
With
+26.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
35 currently pending
Career history
511
Total Applications
across all art units
Statute-Specific Performance

§101
6.9%
-33.1% vs TC avg
§103
65.8%
+25.8% vs TC avg
§102
2.3%
-37.7% vs TC avg
§112
15.9%
-24.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 476 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is responsive to the amendment received 02/18/2026. 

In the response to the Non-Final Office Action, the applicant states that claims 1, 7-9, and 20 are amended. Claim 6 is cancelled without prejudice or disclaimer. Claims 1-5 and 7-20 are pending in the application. 

Claims 1, 7-9, and 20 have been amended. Claim 6 is cancelled. In summary, claims 1-5 and 7-20 are pending in current application.

Response to Arguments
Applicant's arguments filed 02/18/2026 have been fully considered but they are not persuasive. 
Regarding to claim 1, the applicant argues that Cao does not disclose, at least, "obtain[ing] a three-dimensional (3D) model of a body of a person; obtain[ing] body pixels based on an image of the body of the person; generat[ing] projected body points by projecting points of the 3D model into an image plane; determin[ing] a body-point loss based on a comparison of the body pixels and the projected body points; modify[ing] the 3D model based on the body-point loss to generate a first modified 3D model comprising vertices; obtain[ing] a segment identifier indicative of pixels of the image that relate to the body of the person; project[ing] the vertices of the first modified 3D model into the image plane to generate projected vertices; determin[ing] a segment loss based on a comparison of the segment identifier and the projected vertices; and modify[ing] the first modified 3D model based on the segment loss to generate a second modified 3D model," as recited by amended claim 1. The arguments have been fully considered, but they are not persuasive. The examiner cannot concur with the applicant for following reasons:
Cao discloses “obtain a segment identifier indicative of pixels of the image that relate to the body of the person”. For example, in paragraph [0041], Cao teaches collecting color images, e.g., Red-Green-Blue pixels.  In paragraph [0055], Cao teaches facial animation model 700 detects and identifies ninety six (96) two-dimensional (2D) face landmarks, i.e. identifiers; Cao further teaches face landmarks correspond to face features such as face features such as mouth corner, nose tip, and/or face contour; Cao further more teaches for each landmark, k, facial animation model 700 finds the corresponding vertex index on the face mesh denoted as lk, and calculates the L2 distance between 2D face landmark and its corresponding mesh vertex projection; 
    PNG
    media_image1.png
    32
    404
    media_image1.png
    Greyscale
 . In paragraph [0067], Cao teaches Coarse mesh tracking 955a penalizes the L2 distance between the 2D landmarks and their corresponding mesh vertices projection. In paragraph [0073], Cao teaches sample a number of vertices; Cao further teaches the 3D locations of these vertices.
Cao further discloses “project the vertices of the first modified 3D model into the image plane to generate projected vertices”. For example, in paragraph [0030], Cao teaches training a personalized face model to new environments that enables accurate real-time face tracking.  In paragraph [0038], Cao teaches modifying coefficients of a 3D model according to a desired outcome of the machine learning model. In paragraph [0055], Cao teaches for each landmark, k, facial animation model 700 finds the corresponding vertex index on the face mesh denoted as lk, and calculates the L2 distance between 2D face landmark and its corresponding mesh vertex projection; 
    PNG
    media_image1.png
    32
    404
    media_image1.png
    Greyscale
 . In paragraph [0067], Cao teaches Coarse mesh tracking 955a penalizes the L.sub.2 distance between the 2D landmarks and their corresponding mesh vertices projection. In paragraph [0073], Cao teaches constructing a position vector using the 3D locations of these vertices.
Cao further more discloses “determine a segment loss based on a comparison of the segment identifier and the projected vertices” . For example, in paragraph [0053], Cao teaches
    PNG
    media_image2.png
    62
    406
    media_image2.png
    Greyscale
 . In paragraphs [0054-0057], Cao teaches the losses formulated in Eqs. 5-7; Mv is the projected face mesh in screen space calculated by Eq. 4; Cao further teaches calculating losses using equations 5-7. In paragraph [0058], Cao teaches determining landmark loss and optical flow loss. In paragraph [0060], Cao teaches minimizing the landmark loss. In paragraph [0062], Cao teaches minimizing the image loss, landmark loss, and optical flow. In paragraph [0070], Cao teaches a parameter loss includes the L2 distance between the regressed results and the ground truth parameters. In paragraph [0071], Cao teaches minimizing the image loss. In paragraph [0081], Cao teaches determining a loss factor based on selected points in a test image from the subject and a rendition of the test image by the three-dimensional model.
In addition, Cao discloses  “modify the first modified 3D model based on the segment loss to generate a second modified 3D model”. For example, in paragraph [0030], Cao teaches training a personalized face model to new environments that enables accurate real-time face tracking; Cao further teaches modifying a personalized face model to an updated model , i.e. modified model, in training. In paragraph [0032], Cao teaches training and modifying multiple machine learning models running in parallel in one or more servers 130.  In paragraph [0038], Cao teaches modifying coefficients of a 3D model according to a desired outcome of the machine learning model;  Cao further teaches modifying and updating a personalized face model in training. In paragraph [0066], Cao teaches training increment regression encoder 957 to compensate for different illumination configurations in test videos. In paragraph [0067], Cao teaches coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a. In paragraph [0082], Cao teaches updating the three-dimensional model according to the loss factor. In paragraph [0090], Cao teaches updating a three-dimensional model according to the loss factor, the three-dimensional model comprising the facial expression factor, the head pose factor, and the illumination parameter.

Claims 1-5, 7-9, 11, and 17-20 are not allowable due to the similar reasons as discussed above.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 3, 5, 7, 9, 11, 17, and 20 are rejected under 35 U.S.C. 102 (a)(1) as being anticipated by Cao (US 20220358719 A1).
Regarding to claim 1 (Currently Amended), Cao discloses an apparatus for human-body-model shape modification, the apparatus comprising (Fig. 1; [0032]: a single client device 110 trains multiple machine learning models running in parallel in one or more servers 130; [0034]: server 130 and client device 110 are apparatus;  Fig. 2; [0035]: processors 212 executes instructions stored in memories 220; [0041]: train a facial animation model to provide a 3D avatar 421, i.e. 3D model, in real-time; [0046]: capture a facial expression 542 and a head pose 644 from a test image 611 to provide a 3D avatar 621 in real-time; [0082]: update the three-dimensional model according to the loss factor): 
at least one memory (Fig. 2; [0034]: a memory 220-1 and a processor 212-1; Fig. 2; [0035]: processors 212 executes instructions stored in memories 220); and 
at least one processor coupled to the at least one memory and configured to (Fig. 1; [0032]: one or more processors; Fig. 2; [0035]: processors 212 executes instructions stored in memories 220; the memories are connected to processors as illustrated in Fig. 2; 
    PNG
    media_image3.png
    496
    698
    media_image3.png
    Greyscale
): 
obtain a three-dimensional (3D) model of a body of a person (Fig. 4; [0041]: capture a facial expression of a user 401 to train a facial animation model to provide a 3D avatar 421 in real-time; 
    PNG
    media_image4.png
    466
    708
    media_image4.png
    Greyscale
 ; Fig. 12; [0078]: the head and shoulder are top parts of a body of a person; form and obtain a three-dimensional mesh, i.e. 3D model, for the subject based on a facial expression factor and a head pose of the subject extracted from the images of the subject;  Fig. 12; [0080]: form and obtain a three-dimensional model for the subject, i.e. the top part of the body of a person,  based on the three-dimensional mesh and the texture transformation);
 obtain body pixels based on an image of the body of the person ([0041]: capture a facial expression of a user 401 to train a facial animation model to provide a 3D avatar 421 in real-time; collect color images, e.g., obtain Red-Green-Blue pixels; Fig. 7; [0048]: extract and obtain a face mesh 720 and a relighted texture 733a, i.e., body pixels, from an input image; Fig. 6; [0046]: achieve pixel-precise facial animation in real-time; Fig. 9; [0066-0067]: a coarse mesh tracking 955a runs on an input image 911 to obtain face parameters 942a, a mesh 920a, and a texture map 923a; [0076]: create real-time facial animation, i.e., including pixel information, from binocular video; the facial animation model includes a facial expression encoder, a head pose encoder, and a lighting tool; Fig. 12; [0077]: collect multiple images of a subject; obtain a three-dimensional representation of the subject by applying the three-dimensional model to the binocular image from the subject; obtain and provide the images from the subject under multiple illumination configurations to a low-resolution multilayered network and to a high-resolution multilayered network; Fig. 13; [0088]: determine a texture and a color to a face of the subject based on the illumination parameter; color includes Red-Green-Blue pixels); 
generate projected body points by projecting points of the 3D model into an image plane (Fig. 4; [0043]: generate and display an image in a display 416 of 3D avatar 421; 
    PNG
    media_image5.png
    446
    678
    media_image5.png
    Greyscale
 ; [0045]: generate a view-dependent 3D avatar 521; 3D avatar 521 is a solid model including depth information and relative positioning for the anatomic features of subject; Fig. 6; [0046]: generate and provide a 3D avatar 621 in real-time; [0053]: project the face mesh to the original image space using intrinsic camera parameters; [0066]: render a 3D avatar 921; Fig. 12; [0081]:  a rendition of the test image by the three-dimensional model; Fig. 13; [0089]: generate a three-dimensional representation of the subject based on the facial expression factor, the head pose factor, and the texture and color of the face of the subject); 
determine a body-point loss based on a comparison of the body pixels and the projected body points (Fig. 7; [0048]: determine and minimize the image loss between this relighted avatar and the input image; [0053]: 
    PNG
    media_image2.png
    62
    406
    media_image2.png
    Greyscale
 ; [0054-0057]: the losses formulated in Eqs. 5-7; Mv is the projected face mesh in screen space calculated by Eq. 4; calculate losses using equations 5-7; [0058]: determine landmark loss and optical flow loss; [0060]: minimize the landmark loss; [0070]: a parameter loss includes the L2 distance between the regressed results and the ground truth parameters; [0071]: minimize the image loss; Fig. 12; [0081]: determine a loss factor based on selected points in a test image from the subject and a rendition of the test image by the three-dimensional model; compare a selected point in the two-dimensional image with a corresponding point in the test image);  
modify the 3D model based on the body-point loss to generate a first modified 3D model ([0066]: train increment regression encoder 957 to compensate for different illumination configurations in test videos; [0067]: coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a; Fig. 12; [0082]: update, i.e., modify, the three-dimensional model according to the loss factor; [0083]: provide an image of the three-dimensional model for display in a graphic user interface of a client device) comprising vertices;
obtain a segment identifier indicative of pixels of the image that relate to the body of the person (Cao; [0055]: face landmarks correspond to face features such as face features such as mouth corner, nose tip, and/or face contour; for each landmark, k, facial animation model 700 finds the corresponding vertex index on the face mesh denoted as lk, and calculates the L2 distance between 2D face landmark and its corresponding mesh vertex projection; 
    PNG
    media_image1.png
    32
    404
    media_image1.png
    Greyscale
 ; [0067]: Coarse mesh tracking 955a penalizes the L2 distance between the 2D landmarks and their corresponding mesh vertices projection; [0073]: sample a number of vertices; the 3D locations of these vertices);
project the vertices of the first modified 3D model into the image plane to generate projected vertices (Cao; [0055]: for each landmark, k, facial animation model 700 finds the corresponding vertex index on the face mesh denoted as lk, and calculates the L2 distance between 2D face landmark and its corresponding mesh vertex projection; 
    PNG
    media_image1.png
    32
    404
    media_image1.png
    Greyscale
 ; [0067]: Coarse mesh tracking 955a penalizes the L.sub.2 distance between the 2D landmarks and their corresponding mesh vertices projection; [0073]: construct a position vector using the 3D locations of these vertices);
determine a segment loss based on a comparison of the segment identifier and the projected vertices ([0053]: 
    PNG
    media_image2.png
    62
    406
    media_image2.png
    Greyscale
 ; [0054-0057]: the losses formulated in Eqs. 5-7; Mv is the projected face mesh in screen space calculated by Eq. 4; calculate losses using equations 5-7; [0058]: determine landmark loss and optical flow loss; [0060]: minimize the landmark loss; [0062]: minimize the image loss, landmark loss, and optical flow; [0070]: a parameter loss includes the L2 distance between the regressed results and the ground truth parameters; [0071]: minimize the image loss; [0081]: determine a loss factor based on selected points in a test image from the subject and a rendition of the test image by the three-dimensional model); and
modify the first modified 3D model based on the segment loss to generate a second modified 3D model ([0066]: train increment regression encoder 957 to compensate for different illumination configurations in test videos; [0067]: coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a; [0082]: update the three-dimensional model according to the loss factor; [0090]: update a three-dimensional model according to the loss factor, the three-dimensional model comprising the facial expression factor, the head pose factor, and the illumination parameter).

Regarding to claim 3 (Original), Cao discloses the apparatus of claim 1, wherein the at least one processor is configured to process the image using a machine-learning model to identify the body pixels based on the image (Cao; Fig. 1; [0032]: train multiple machine learning models; [0036]: access one or more machine learning models stored in a training database; [0038]: the machine learning model includes a neural network, i.e. NN, a convolutional neural network, i.e., CNN,  GAN, and DRNN; Fig. 7; [0048]: identify and extract a face mesh 720 and a relighted texture 733a from an input image; minimize the image loss between this relighted avatar and the input image; [0066]: identify and extract the face mesh 920b and texture 923b; [0067]: a linear Principal Components Analysis, i.e., PCA, model; coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a; [0078]: a facial expression factor and a head pose of the subject are extracted from the images of the subject.).

Regarding to claim 5 (Original), Cao discloses the apparatus of claim 1, wherein the points of the 3D model comprise landmarks (Cao; [0054-0057]: the losses formulated in Eqs. 5-7; Mv is the projected face mesh in screen space calculated by Eq. 4; calculate losses using equations 5-7; face landmarks; [0060]: the landmark loss;  [0067]: Coarse mesh tracking 955a penalizes the L2 distance between the 2D landmarks and their corresponding mesh vertices projection), wherein the projected body points comprise projected landmark points, and wherein the body pixels comprise landmark pixels ([0055]: for each landmark, k, facial animation model 700 finds the corresponding vertex index on the face mesh denoted as lk, and calculates the L2 distance between 2D face landmark and its corresponding mesh vertex projection; 
    PNG
    media_image1.png
    32
    404
    media_image1.png
    Greyscale
 ; [0060]: the landmark loss;  [0067]: Coarse mesh tracking 955a penalizes the L2 distance between the 2D landmarks and their corresponding mesh vertices projection).

Regarding to claim 7 (Currently Amended), Cao discloses the apparatus of claim 1, wherein the projected body points comprise first projected body points (Cao;  Fig. 4; [0043]: generate and display an image in a display 416 of 3D avatar 421; 
    PNG
    media_image5.png
    446
    678
    media_image5.png
    Greyscale
 ; [0045]: generate a view-dependent 3D avatar 521; 3D avatar 521 is a solid model including depth information and relative positioning for the anatomic features of subject; Fig. 6; [0046]: generate and provide a 3D avatar 621 in real-time; [0053]: project the face mesh to the original image space using intrinsic camera parameters; [0066]: render a 3D avatar 921; Fig. 12; [0081]:  a rendition of the test image by the three-dimensional model), and wherein the body-point loss comprises a first body-point loss (Cao; Fig. 4; [0043]: generate and display an image in a display 416 of 3D avatar 421; 
    PNG
    media_image5.png
    446
    678
    media_image5.png
    Greyscale
 ; [0045]: generate a view-dependent 3D avatar 521; 3D avatar 521 is a solid model including depth information and relative positioning for the anatomic features of subject; Fig. 6; [0046]: generate and provide a 3D avatar 621 in real-time; [0053]: project the face mesh to the original image space using intrinsic camera parameters; [0066]: render a 3D avatar 921; Fig. 12; [0081]:  a rendition of the test image by the three-dimensional model), wherein the at least one processor is configured to:
 project points of the first modified 3D model into the image plane to generate second projected body points (Cao; [0054-0057]: Mv is the projected face mesh in screen space calculated by Eq. 4; calculate losses using equations 5-7; Fig. 12; [0082]: update the three-dimensional model according to the loss factor; [0083]: provide and display an image of the three-dimensional model for display in a graphic user interface of a client device.); and 
determine a second body-point loss based on a comparison between the body pixels and the second projected body points (Fig. 7; [0048]: determine and minimize the image loss between this relighted avatar and the input image; [0053]: 
    PNG
    media_image2.png
    62
    406
    media_image2.png
    Greyscale
 ; [0054-0057]: the losses formulated in Eqs. 5-7; Mv is the projected face mesh in screen space calculated by Eq. 4; calculate losses using equations 5-7; [0058]: determine landmark loss and optical flow loss; [0060]: minimize the landmark loss; [0062]: minimize the image loss, landmark loss, and optical flow; [0071]: minimize the image loss; Fig. 12; [0081]: determine a loss factor based on selected points in a test image from the subject and a rendition of the test image by the three-dimensional model; compare a selected point in the two-dimensional image with a corresponding point in the test image), wherein the first modified 3D model is modified based on the segment loss and the second body-point loss ([0066]: train increment regression encoder 957 to compensate for different illumination configurations in test videos; [0067]: coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a; [0082]: update the three-dimensional model according to the loss factor; [0090]: update a three-dimensional model according to the loss factor, the three-dimensional model comprising the facial expression factor, the head pose factor, and the illumination parameter).

Regarding to claim 9 (Currently Amended), Cao discloses the apparatus of claim 1, wherein the at least one processor is configured to: 
obtain 3D data based on the image (Cao; Fig. 12; [0077]: collecting a binocular image from the subject, obtaining a three-dimensional representation of the subject by applying the three-dimensional model to the binocular image from the subject); 
render the second modified 3D model to generate rendered 3D data (Cao; Fig. 12; [0078]: the head and shoulder are top parts of a body of a person; form and obtain a three-dimensional mesh for the subject based on a facial expression factor and a head pose of the subject extracted from the images of the subject;  Fig. 12; [0080]: form and obtain a three-dimensional model for the subject, i.e. top part of the body of a person,  based on the three-dimensional mesh and the texture transformation); 
determine a 3D loss based on a comparison between the 3D data and the rendered 3D data (Cao;  [0053]: 
    PNG
    media_image2.png
    62
    406
    media_image2.png
    Greyscale
 ; [0054-0057]: the losses formulated in Eqs. 5-7; Mv is the projected face mesh in screen space calculated by Eq. 4; calculate losses using equations 5-7; [0058]: determine landmark loss and optical flow loss; [0060]: minimize the landmark loss; [0062]: minimize the image loss, landmark loss, and optical flow; [0071]: minimize the image loss; [0081]: determine a loss factor based on selected points in a test image from the subject and a rendition of the test image by the three-dimensional model); and 
modify the second modified 3D model based on the 3D loss to generate a third modified 3D model (Cao; [0066]: train increment regression encoder 957 to compensate for different illumination configurations in test videos; [0067]: coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a; Fig. 12; [0082]: update, i.e., modify, the three-dimensional model according to the loss factor; [0083]: provide an image of the three-dimensional model for display in a graphic user interface of a client device  [0090]: determine a loss factor based on selected points in the binocular image from the subject and the three-dimensional representation for the subject, and update a three-dimensional model according to the loss factor, the three-dimensional model comprising the facial expression factor, the head pose factor, and the illumination parameter).

Regarding to claim 11 (Original), Cao discloses the apparatus of claim 9, wherein the at least one processor is configured to process the image using a machine-learning model to generate the 3D data related to the image (Cao; Fig. 1; [0032]: train multiple machine learning models; [0036]: access one or more machine learning models stored in a training database; [0038]: the machine learning model includes a neural network, i.e. NN, a convolutional neural network, i.e., CNN,  GAN, and DRNN; Fig. 7; [0048]: identify and extract a face mesh 720 and a relighted texture 733a from an input image; minimize the image loss between this relighted avatar and the input image; [0066]: identify and extract the face mesh 920b and texture 923b; [0078]: a facial expression factor and a head pose of the subject are extracted from the images of the subject.).

Regarding to claim 17 (Original), Cao discloses the apparatus of claim 1, wherein the at least one processor is configured to: 
obtain a plurality of images of the body of the person (Cao;  Fig. 12; [0077]: collect multiple images of a subject, the images from the subject; one or more simultaneous views from different profiles of the subject); 
select the image from among the plurality of images (Cao; [0078]: form a three-dimensional mesh for the subject based on a facial expression factor and a head pose of the subject extracted from the selected images of the subject.); and 
modify the 3D model based on the plurality of images (Cao; [0036]: create, store, update and maintain a facial animation model 240;  [0066]: train increment regression encoder 957 to compensate for different illumination configurations in test videos; [0067]: coarse mesh tracking 955a transforms the face mesh 920a (M) using rigid head pose 942a; [0076]: create and update a facial animation model;  Fig. 12; [0082]: update, i.e., modify, the three-dimensional model according to the loss factor; [0083]: provide an image of the three-dimensional model for display in a graphic user interface of a client device).

Regarding to claim 20 (Currently Amended), Cao discloses a method for human-body-model shape modification, the method  (Fig. 1; [0032]: a single client device 110 trains multiple machine learning models running in parallel in one or more servers 130; [0034]: server 130 and client device 110;  Fig. 2; [0035]: processors 212 are configured to execute instructions stored in memories 220; [0041]: train a facial animation model to provide a 3D avatar 421 in real-time; [0046]: capture a facial expression 542 and a head pose 644 from a test image 611 to provide a 3D avatar 621 in real-time; [0082]: update the three-dimensional model according to the loss factor) comprising: 
The rest claim limitations are similar to claim limitations recited in claim 1. Therefore, same rational used to reject claim 1 is also used to reject claim 20. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 4, 8, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Cao (US 20220358719 A1) in view of Sun (US 20210232924 A1).
Regarding to claim 2 (Original), Cao discloses the apparatus of claim 1, wherein the at least one processor is configured to process the image using a machine-learning model to generate the 3D model of the body (Cao; [0036]:  model training engine 232 accesses one or more machine learning models stored in a training database 252; Fig. 8; [0059]: combine a low-resolution architecture 855a and a high-resolution architecture 855b for a lighting tool; [0066]: regress facial parameters; Fig. 10; [0072]: a network architecture 1000 for a facial animation model configured to account for different environmental conditions of an image in a few shots from the subject), 
Cao fails to explicitly disclose wherein the 3D model comprises a Skinned Multi-Person Linear (SMPL) model.
In same field of endeavor, Sun teaches:
wherein the 3D model comprises a Skinned Multi-Person Linear (SMPL) model ([0004]: a skinned multi-person linear, i.e. SMPL, model is used to perform three-dimensional human body reconstruction for a human body; [0056]:  the SMPL model is used for three-dimensional human body reconstruction; [0058]: train an SMPL parameter prediction model provided in the embodiments of this application).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cao to include wherein the 3D model comprises a Skinned Multi-Person Linear (SMPL) model as taught by Sun. The motivation for doing so would have been to use the SMPL model for three-dimensional human body reconstruction; to improve the accuracy of a reconstructed three-dimensional human body in the aspects of the human body pose and shape as taught by Sun paragraphs [0056] and [0058].

Regarding to claim 4 (Original), Cao discloses the apparatus of claim 1, wherein the points of the 3D model (same as rejected in claim 1).
Cao fails to explicitly disclose comprising joints, wherein the projected body points comprise projected joint points, and wherein the body pixels comprise joint pixels.
In same field of endeavor, Sun teaches:
comprising joints ([0048]: joins; [0050]: calculate positions of joints of the human body; [0057]: two-dimensional joints, three-dimensional joints), wherein the projected body points comprise projected joint points, and wherein the body pixels comprise joint pixels ([0048]: joins; [0050]: calculate positions of joints of the human body; [0057]: two-dimensional joints, three-dimensional joints;  Fig. 2; [0083]: input a sample picture 21 into a pose parameter prediction model 22; 
    PNG
    media_image6.png
    332
    622
    media_image6.png
    Greyscale
; multiple joins as illustrated in Fig. 2).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cao to include comprising joints, wherein the projected body points comprise projected joint points, and wherein the body pixels comprise joint pixels as taught by Sun. The motivation for doing so would have been to use the SMPL model for three-dimensional human body reconstruction; to improve the accuracy of a reconstructed three-dimensional human body in the aspects of the human body pose and shape as taught by Sun paragraphs [0056] and [0058].

Regarding to claim 8 (Currently Amended), Cao discloses the apparatus of claim 1, wherein the at least one processor is configured to process the image using a machine-learning model to generate the segment identifier (Cao; Fig. 1; [0032]: train multiple machine learning models; [0036]: access one or more machine learning models stored in a training database; [0038]: the machine learning model includes a neural network, i.e. NN, a convolutional neural network, i.e., CNN,  GAN, and DRNN; Fig. 7; [0048]: identify and extract a face mesh 720 and a relighted texture 733a from an input image; minimize the image loss between this relighted avatar and the input image; [0066]: identify and extract the face mesh 920b and texture 923b; Fig. 10; [0072]: a network architecture 1000 for a facial animation model configured to account for different environmental conditions of an image in a few shots from the subject; [0078]: a facial expression factor and a head pose of the subject are extracted from the images of the subject.), 
Cao fails to explicitly disclose wherein the segment identifier comprises a silhouette.
In same field of endeavor, Sun teaches:
wherein the segment identifier comprises a silhouette (Fig. 2; [0111]: a three-dimensional joint map 25 according to the three-dimensional human body model 24; Fig. 2; [0128]: the three-dimensional joint map 25 and the two-dimensional joint map 2 include silhouette as illustrated in Fig. 2: 
    PNG
    media_image7.png
    146
    232
    media_image7.png
    Greyscale
 ).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cao to include wherein the segment identifier comprises a silhouette as taught by Sun. The motivation for doing so would have been to use the SMPL model for three-dimensional human body reconstruction; to improve the accuracy of a reconstructed three-dimensional human body in the aspects of the human body pose and shape as taught by Sun paragraphs [0056] and [0058].

Regarding to claim 18 (Original), Cao discloses the apparatus of claim 17, wherein: 
to modify the 3D model based on the plurality of images the at least one processor is configured to modify multiple instances of the 3D model (Cao; [0082]: update the three-dimensional model according to the loss factor; Fig. 10; [0072]: a network architecture 1000 for a facial animation model configured to account for different environmental conditions of an image in a few shots from the subject; [0090]: update a three-dimensional model according to the loss factor, the three-dimensional model comprising the facial expression factor, the head pose factor, and the illumination parameter); 
wherein the multiple instances of the 3D model have respective body poses and respective translations (Cao; [0052]: the head rotation and the translation respectively; [0071]: translation; Fig. 10; [0072]: a network architecture 1000 for a facial animation model configured to account for different environmental conditions of an image in a few shots from the subject; [0078]: identify a head pose of the subject, the head pose including a rotation of a head of the subject and a translation of the head of the subject).
Cao fails to explicitly disclose:
wherein the multiple instances of the 3D model share a body shape. 
In same field of endeavor, Sun teaches wherein the multiple instances of the 3D model share a body shape (Fig. 11; Fig. 12; [0193-0194]:  
    PNG
    media_image8.png
    154
    624
    media_image8.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cao to include wherein the multiple instances of the 3D model share a body shape as taught by Sun. The motivation for doing so would have been to use the SMPL model for three-dimensional human body reconstruction; to improve the accuracy of a reconstructed three-dimensional human body in the aspects of the human body pose and shape as taught by Sun paragraphs [0056] and [0058].

Regarding to claim 19 (Original), Cao discloses the apparatus of claim 1, 
Cao fails to explicitly disclose wherein the 3D model is modified according to a gradient-descent technique.
In same field of endeavor, Sun teaches:
wherein the 3D model is modified according to a gradient-descent technique ([0079]: using a gradient descent algorithm).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Cao to include  wherein the 3D model is modified according to a gradient-descent technique as taught by Sun. The motivation for doing so would have been to use the SMPL model for three-dimensional human body reconstruction; to improve the accuracy of a reconstructed three-dimensional human body in the aspects of the human body pose and shape as taught by Sun paragraphs [0056] and [0058].

Allowable Subject Matter
Claims 10 and 12-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 5712727642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

May 20, 2024
Application Filed
Nov 24, 2025
Non-Final Rejection — §102, §103
Feb 04, 2026
Interview Requested
Feb 11, 2026
Examiner Interview Summary
Feb 18, 2026
Response Filed
Mar 16, 2026
Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/597,939
Patent 12602816
SIMULATED CONFIGURATION EVALUATION APPARATUS AND METHOD
2y 5m to grant Granted Apr 14, 2026
18/684,393
Patent 12603024
DISPLAY CONTROL DEVICE
2y 5m to grant Granted Apr 14, 2026
18/527,903
Patent 12586310
APPARATUS AND METHOD WITH IMAGE PROCESSING
2y 5m to grant Granted Mar 24, 2026
18/066,199
Patent 12578846
GENERATING MASKED REGIONS OF AN IMAGE USING A PREDICTED USER INTENT
2y 5m to grant Granted Mar 17, 2026
18/414,841
Patent 12579727
APPARATUS AND METHOD FOR ASYNCHRONOUS RAY TRACING
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
73%
Grant Probability
99%
With Interview (+26.6%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 476 resolved cases by this examiner. Grant probability derived from career allow rate.