Last updated: April 19, 2026

Application No. 18/827,335

BODY TRACKING FROM MONOCULAR VIDEO

Non-Final OA §103

Filed

Sep 06, 2024

Examiner

SAJOUS, WESNER

Art Unit

2612

Tech Center

2600 — Communications

Assignee

Roblox Corporation

OA Round

1 (Non-Final)

Interview Optional

— +7.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1196 resolved cases, 2023–2026

Examiner Intelligence

SAJOUS, WESNER View full profile →

Grants 92% — above average

Career Allow Rate

1099 granted / 1196 resolved

+29.9% vs TC avg

Moderate +8% lift

Without

With

+7.6%

Interview Lift

resolved cases with interview

Typical timeline

2y 5m

Avg Prosecution

29 currently pending

Career history

1225

Total Applications

across all art units

Statute-Specific Performance

§101

17.0%

-23.0% vs TC avg

§103

33.5%

-6.5% vs TC avg

§102

19.1%

-20.9% vs TC avg

§112

19.6%

-20.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1196 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . It is responsive to the submission dated 09/06/2024. Claims 1-20 are presented for examination, of which, claims 1, 8 and 15 are independent claims.

Information Disclosure Statement
2. 	The information disclosure statements (IDSs) submitted on 11/07/2024 are in compliance with the provisions of 37 CFR 1.97 and are being considered by the Examiner. 

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.	Claims 1-3, 5-10, 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sarah et al. (US 20240350032) in view of Donnell et al. (US 20240029330).
	Considering claim 1, Sarah discloses a computer-implemented method (see para. 11) comprising:

obtaining a video including a plurality of video frames depicting movement of a human subject (e.g., receiving a video segment of said video recording that captures movement of an infant, wherein the video segment includes a plurality of frames; see para. 11 or 25);
extracting two-dimensional (2D) images of the human subject from the video frames (e.g., extracting a set of feature vectors corresponding to the plurality of frames; see para. 15, wherein the set of feature vectors corresponds to the 2D images. Para. 117 of Sarah also discloses the posture classification method first extracts either a 2-dimensional (2D) or 3-dimensional (3D) human skeleton pose prediction);
providing the 2D images as input to a pre-trained neural network model (e.g., using a transition segmentor model with the set of feature vectors as input for training using vectors representing posture transitions; see para. 15. Additionally, para. 99 of Sarah discloses the overall infant action recognition pipeline, encompassing infant-specific preprocessing and the action recognition phase. The infant is initially detected in raw frames and subsequently serves as input for both 2D and 3D pose estimation facilitated by fine-tuned domain-adapted infant pose (FiDIP) model and heuristic weakly supervised 3D human pose estimation infant (HW-HuP-Infant) model, respectively. The resulting pose information may be further processed into heatmaps, serving as input for convolutional neural network (CNN)-based models…. and recurrently neural network (RNN)-based models to predict infant actions);
determining a pose of the human subject based on the 2D images, wherein each pose comprises respective 2D positions for a plurality of upper body joints (e.g., skeleton joints corresponding to shoulders, elbows or wrists; see para. 116) of the human subject (e.g., determining, using a pose estimation model, pose estimation data representing a human skeleton pose for each frame of the plurality of frames, wherein the human skeleton pose is based on joint locations and joint angles of the infant, wherein the pose model is a two-dimensional infant pose dataset; see paras. 14 and 116);
generating, by the pre-trained neural network model and based on the respective 2D positions, a three-dimensional (3D) pose estimation of respective 3D positions of the plurality of upper body joints of the human subject (e.g., para. 99 of Sarah discloses the overall infant action recognition pipeline, encompassing infant-specific preprocessing and the action recognition phase. The infant is initially detected in raw frames using You Only Look Once Version 7 (YOLOv7) and subsequently serves as input for both 2D and 3D pose estimation facilitated by fine-tuned domain-adapted infant pose (FiDIP) model and heuristic weakly supervised 3D human pose estimation infant (HW-HuP-Infant) model, respectively. The resulting pose information may be further processed into heatmaps, serving as input for convolutional neural network (CNN)-based models…. and recurrently neural network (RNN)-based models to predict infant actions. Paras. 166-167 discloses that each of frames pose datasets associated with the body joints is pretrained);
determining confidence scores for the plurality of upper body joints in the 3D pose estimation, the confidence scores representing a prediction accuracy of the respective 3D positions of the plurality of upper body joints (e.g., determine probability values corresponding to each frame of the plurality of frames and representing a confidence score for the posture prediction of the corresponding frame… subset of the plurality of frames representing a period of uncertainty; see paras. 17 and 110. Para. 124 of Sarah further discloses the posture prediction accuracy scores defined using the 3D pose-based posture classification networks may be reported for both the initial model trained on SyRIP and the fine-tuned model trained further on InfAct, as shown in Table 2);
selecting a plurality of keypoints of the upper body joints of the human subject based on the confidence scores (e.g., determining, using the skeletal pose, a set of skeleton keypoints corresponding to an adult skeleton, and determining, using an action recognition model with the set of skeleton keypoints as input, the infant action label. See para. 21; and para. 128 of Sarah further teaches that for Joint Locations: For each frame, a residual vector obtained by applying principle components analysis (PCA) [31] to the sequence of keypoint coordinates for each body joint.);
[animating a 3D avatar for display in a user interface] using at least the selected plurality of keypoints, wherein the animation comprises transforming coordinates of the estimated 3D positions of the upper body joints to coordinates of corresponding joints [of the 3D avatar] (e.g. Joint Locations: For each frame, a residual vector obtained by applying principle components analysis (PCA) [31] to the sequence of keypoint coordinates for each body joint. See para. 128. Para. 154 of Sarah also teaches: After preprocessing, the extracted sequence of body keypoints from the input video is fed into various state-of-the-art skeleton-based action recognition models leveraging different aspects of infant-specific pose representations. additionally, claims 19 and 20 of Sarah discloses generating a dataset of a plurality of infant actions using an action recognition model with the set of skeleton keypoints as input, wherein the action recognition model is a three-dimensional convolutional network with the skeleton keypoints from each frame converted into a heatmap).
	While Sarah appears to disclose using selected joint keypoints for each frame of a video, using 3D pose-based posture model features for transforming coordinates of estimated positions of 3D body poses associated with joints or skeletal information to represent and generate a video-based actions, according to a 3D-based transition segmentation of skeleton sequences (see paras. 8-9, 128-133, 141-145, 151-154, 157-61), Sarah fails to particularly teach animating a 3D avatar for display in a user interface, which is disclosed by Donnell (see paras. 3-4).
	Particularly, Donnell discloses generating and display virtual avatar models from one or more images to a user through a display device, wherein said generating may include modifying at least a portion of a virtual avatar model as a function of user input. See paras. 3-4, 14 and 20. An operational processing model of the virtual avatar may include a digital representation of three-dimensional characters and may be programmed to perform behavioral parameters corresponding to animations of virtual entity in augmented reality. See paras. 22-23 and 25.
	Accordingly, it would have been obvious to one of the ordinary skilled in the art, before the effective filling date of the invention was made, to have modified the teachings of Sarah to include animating a 3D avatar for display in a user interface, in the same conventional manner as taught by Donnell, in order to provide an improved computer vision system enabling scalability  functions capable of transforming a user input of behavioral parameters of virtual entity  to increase an engagement of virtual entity for the creation of transitional images that can be used to generate an animated sequence of images from one simulated emotional state and/or response. See paras. 24-25 of Donnell.
	As per claim 2, Sarah, as modified by Donnell, discloses the animated 3D avatar mimics movements of the human subject without user-perceptible lag based on the 3D pose estimation. See paras. 25-26 and 43 of Donnell and the rationale above with respect to the rejections of claim 1 for reason of obviousness.
	As per claim 3, Sarah, as modified by Donnell, discloses applying temporal smoothing to the 3D pose estimations across consecutive video frames of the plurality of video frames. See paras. 132-133 of Sarah.
As per claim 5, considering that in Sarah fourth subset of frames corresponding to a previously detected first or second or third subset of a plurality of frames representing a period of uncertainty is similarly determined when the probability values of frames corresponding to the fourth subset fail to exceed a threshold value (see paras. 35-36 and 61-63), triggering a re-detection of frames of skeleton data based on confidence scores would be considered normal design actions that the skilled person would do without undue experiment of Sarah. As such, the Sarah reference obviously encompasses triggering a re-detection of the upper body joints of the human subject in the video if the confidence scores fall below a predefined threshold.
As per claim 6, Sarah, as modified by Donnell, discloses joint positions of the 3D avatar are scaled to match body proportions of the human subject. See paras. 153-154 of Sarah and 25-26 and 43 of Donnell and the rationale above with respect to the rejections of claim 1 for reason of obviousness.
As per claim 7, Sarah, as modified by Donnell, discloses the pre-trained neural network model uses an attention mechanism to focus on keypoints of the upper body joints of the human subject during 3D pose estimation. See paras. 8-9, 21-22 and 153-157 of Sarah.
The invention of claim 8 recites features that correspond in scope with the limitations recited claim 1. As the limitations of claim 1 were found obvious over the combined teachings of Sarah and Donnell, it is readily apparent that the applied prior arts perform the underlying elements. As such, the limitations of claim 8 are, therefore, subject to rejections under the same rationale as claim 1. In addition, Sarah discloses a system comprising a memory storing instructions to be executed by one or more processors. See claim 10 of Sarah.
Claim 9 is rejected under the same rationale as claim 2.
Claim 10 is rejected under the same rationale as claim 3.
Claim 12 is rejected under the same rationale as claim 5.
Claim 13 is rejected under the same rationale as claim 6.
Claim 14 is rejected under the same rationale as claim 7.
The subject-matter of independent claim 15 corresponds in terms of a non-transitory computer-readable medium to that of independent method claim 1, and the rationale raised above to reject the later also apply, mutatis mutandis, to the former. In addition, Sarah discloses a system comprising a computer-readable medium storing instructions to be executed by one or more processors. See claim 10 of Sarah.
Claim 16 is rejected under the same rationale as claim 2.
Claim 17 is rejected under the same rationale as claim 3.
Claim 19 is rejected under the same rationale as claim 5.
Claim 20 is rejected under the same rationale as claim 6.

Allowable Subject Matter
5.    Claims 4, 11 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, because the prior art of record fail to teach the method of claim 1, wherein prior to providing the 2D image as input to the pre-trained neural network model, calibrating the 2D image to account for camera distortions.

Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
WEISS et al. (US 20250005965) discloses a technique for extraction of human poses from video data for animation of computer models. In some implementations, a computer-implemented method includes determining a first pose sequence of a human body model based on frames of an input video, the frames depicting movement of a person. The first pose sequence includes poses of the human body model that correspond to the video frames. The first pose sequence is updated to produce a second pose sequence, including, over multiple iterations, determining joint velocities of joints of the pose sequence, determining, in parallel, predicted poses from each of multiple poses of the pose sequence by encoding the joint velocities into parameters of a human motion prior, and based on comparing corresponding predicted poses, adjusting joint angles of the pose sequence. The second pose sequence can provide an animation of a computer model corresponding to the movement of the person in the input video.

7.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to WESNER SAJOUS whose telephone number is (571) 272-7791.  The examiner can normally be reached on M-F 10:00 TO 7:30 (ET).
	Examiner interviews are available via telephone and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice or email the Examiner directly at wesner.sajous@uspto.gov.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Said Broome can be reached on 571-272-2931.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/WESNER SAJOUS/Primary Examiner, Art Unit 2612                                                                                                                                                                                                        


WS
03/18/2026

Read full office action

Prosecution Timeline

Sep 06, 2024

Application Filed

Mar 18, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/197,959

Patent 12597177

Changing Display Rendering Modes based on Multiple Regions

2y 5m to grant Granted Apr 07, 2026

18/286,763

Patent 12597185

METHOD, APPARATUS, AND DEVICE FOR PROCESSING IMAGE, AND STORAGE MEDIUM

2y 5m to grant Granted Apr 07, 2026

18/742,562

Patent 12597203

SIMULATED CONSISTENCY CHECK FOR POINTS OF INTEREST ON THREE-DIMENSIONAL MAPS

2y 5m to grant Granted Apr 07, 2026

18/053,828

Patent 12589303

Computer-Implemented Methods for Generating Level of Detail Assets for Dynamic Rendering During a Videogame Session

2y 5m to grant Granted Mar 31, 2026

18/674,023

Patent 12592038

EDITABLE SEMANTIC MAP WITH VIRTUAL CAMERA FOR MOBILE ROBOT LEARNING

2y 5m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

92%

Grant Probability

99%

With Interview (+7.6%)

2y 5m

Median Time to Grant

Low

PTA Risk

Based on 1196 resolved cases by this examiner. Grant probability derived from career allow rate.

BODY TRACKING FROM MONOCULAR VIDEO

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email