Prosecution Insights
Last updated: April 19, 2026
Application No. 18/827,335

BODY TRACKING FROM MONOCULAR VIDEO

Non-Final OA §103
Filed
Sep 06, 2024
Examiner
SAJOUS, WESNER
Art Unit
2612
Tech Center
2600 — Communications
Assignee
Roblox Corporation
OA Round
1 (Non-Final)
92%
Grant Probability
Favorable
1-2
OA Rounds
2y 5m
To Grant
99%
With Interview

Examiner Intelligence

Grants 92% — above average
92%
Career Allow Rate
1099 granted / 1196 resolved
+29.9% vs TC avg
Moderate +8% lift
Without
With
+7.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
29 currently pending
Career history
1225
Total Applications
across all art units

Statute-Specific Performance

§101
17.0%
-23.0% vs TC avg
§103
33.5%
-6.5% vs TC avg
§102
19.1%
-20.9% vs TC avg
§112
19.6%
-20.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1196 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status 1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . It is responsive to the submission dated 09/06/2024. Claims 1-20 are presented for examination, of which, claims 1, 8 and 15 are independent claims. Information Disclosure Statement 2. The information disclosure statements (IDSs) submitted on 11/07/2024 are in compliance with the provisions of 37 CFR 1.97 and are being considered by the Examiner. Claim Rejections - 35 USC § 103 3. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 4. Claims 1-3, 5-10, 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sarah et al. (US 20240350032) in view of Donnell et al. (US 20240029330). Considering claim 1, Sarah discloses a computer-implemented method (see para. 11) comprising: obtaining a video including a plurality of video frames depicting movement of a human subject (e.g., receiving a video segment of said video recording that captures movement of an infant, wherein the video segment includes a plurality of frames; see para. 11 or 25); extracting two-dimensional (2D) images of the human subject from the video frames (e.g., extracting a set of feature vectors corresponding to the plurality of frames; see para. 15, wherein the set of feature vectors corresponds to the 2D images. Para. 117 of Sarah also discloses the posture classification method first extracts either a 2-dimensional (2D) or 3-dimensional (3D) human skeleton pose prediction); providing the 2D images as input to a pre-trained neural network model (e.g., using a transition segmentor model with the set of feature vectors as input for training using vectors representing posture transitions; see para. 15. Additionally, para. 99 of Sarah discloses the overall infant action recognition pipeline, encompassing infant-specific preprocessing and the action recognition phase. The infant is initially detected in raw frames and subsequently serves as input for both 2D and 3D pose estimation facilitated by fine-tuned domain-adapted infant pose (FiDIP) model and heuristic weakly supervised 3D human pose estimation infant (HW-HuP-Infant) model, respectively. The resulting pose information may be further processed into heatmaps, serving as input for convolutional neural network (CNN)-based models…. and recurrently neural network (RNN)-based models to predict infant actions); determining a pose of the human subject based on the 2D images, wherein each pose comprises respective 2D positions for a plurality of upper body joints (e.g., skeleton joints corresponding to shoulders, elbows or wrists; see para. 116) of the human subject (e.g., determining, using a pose estimation model, pose estimation data representing a human skeleton pose for each frame of the plurality of frames, wherein the human skeleton pose is based on joint locations and joint angles of the infant, wherein the pose model is a two-dimensional infant pose dataset; see paras. 14 and 116); generating, by the pre-trained neural network model and based on the respective 2D positions, a three-dimensional (3D) pose estimation of respective 3D positions of the plurality of upper body joints of the human subject (e.g., para. 99 of Sarah discloses the overall infant action recognition pipeline, encompassing infant-specific preprocessing and the action recognition phase. The infant is initially detected in raw frames using You Only Look Once Version 7 (YOLOv7) and subsequently serves as input for both 2D and 3D pose estimation facilitated by fine-tuned domain-adapted infant pose (FiDIP) model and heuristic weakly supervised 3D human pose estimation infant (HW-HuP-Infant) model, respectively. The resulting pose information may be further processed into heatmaps, serving as input for convolutional neural network (CNN)-based models…. and recurrently neural network (RNN)-based models to predict infant actions. Paras. 166-167 discloses that each of frames pose datasets associated with the body joints is pretrained); determining confidence scores for the plurality of upper body joints in the 3D pose estimation, the confidence scores representing a prediction accuracy of the respective 3D positions of the plurality of upper body joints (e.g., determine probability values corresponding to each frame of the plurality of frames and representing a confidence score for the posture prediction of the corresponding frame… subset of the plurality of frames representing a period of uncertainty; see paras. 17 and 110. Para. 124 of Sarah further discloses the posture prediction accuracy scores defined using the 3D pose-based posture classification networks may be reported for both the initial model trained on SyRIP and the fine-tuned model trained further on InfAct, as shown in Table 2); selecting a plurality of keypoints of the upper body joints of the human subject based on the confidence scores (e.g., determining, using the skeletal pose, a set of skeleton keypoints corresponding to an adult skeleton, and determining, using an action recognition model with the set of skeleton keypoints as input, the infant action label. See para. 21; and para. 128 of Sarah further teaches that for Joint Locations: For each frame, a residual vector obtained by applying principle components analysis (PCA) [31] to the sequence of keypoint coordinates for each body joint.); [animating a 3D avatar for display in a user interface] using at least the selected plurality of keypoints, wherein the animation comprises transforming coordinates of the estimated 3D positions of the upper body joints to coordinates of corresponding joints [of the 3D avatar] (e.g. Joint Locations: For each frame, a residual vector obtained by applying principle components analysis (PCA) [31] to the sequence of keypoint coordinates for each body joint. See para. 128. Para. 154 of Sarah also teaches: After preprocessing, the extracted sequence of body keypoints from the input video is fed into various state-of-the-art skeleton-based action recognition models leveraging different aspects of infant-specific pose representations. additionally, claims 19 and 20 of Sarah discloses generating a dataset of a plurality of infant actions using an action recognition model with the set of skeleton keypoints as input, wherein the action recognition model is a three-dimensional convolutional network with the skeleton keypoints from each frame converted into a heatmap). While Sarah appears to disclose using selected joint keypoints for each frame of a video, using 3D pose-based posture model features for transforming coordinates of estimated positions of 3D body poses associated with joints or skeletal information to represent and generate a video-based actions, according to a 3D-based transition segmentation of skeleton sequences (see paras. 8-9, 128-133, 141-145, 151-154, 157-61), Sarah fails to particularly teach animating a 3D avatar for display in a user interface, which is disclosed by Donnell (see paras. 3-4). Particularly, Donnell discloses generating and display virtual avatar models from one or more images to a user through a display device, wherein said generating may include modifying at least a portion of a virtual avatar model as a function of user input. See paras. 3-4, 14 and 20. An operational processing model of the virtual avatar may include a digital representation of three-dimensional characters and may be programmed to perform behavioral parameters corresponding to animations of virtual entity in augmented reality. See paras. 22-23 and 25. Accordingly, it would have been obvious to one of the ordinary skilled in the art, before the effective filling date of the invention was made, to have modified the teachings of Sarah to include animating a 3D avatar for display in a user interface, in the same conventional manner as taught by Donnell, in order to provide an improved computer vision system enabling scalability functions capable of transforming a user input of behavioral parameters of virtual entity to increase an engagement of virtual entity for the creation of transitional images that can be used to generate an animated sequence of images from one simulated emotional state and/or response. See paras. 24-25 of Donnell. As per claim 2, Sarah, as modified by Donnell, discloses the animated 3D avatar mimics movements of the human subject without user-perceptible lag based on the 3D pose estimation. See paras. 25-26 and 43 of Donnell and the rationale above with respect to the rejections of claim 1 for reason of obviousness. As per claim 3, Sarah, as modified by Donnell, discloses applying temporal smoothing to the 3D pose estimations across consecutive video frames of the plurality of video frames. See paras. 132-133 of Sarah. As per claim 5, considering that in Sarah fourth subset of frames corresponding to a previously detected first or second or third subset of a plurality of frames representing a period of uncertainty is similarly determined when the probability values of frames corresponding to the fourth subset fail to exceed a threshold value (see paras. 35-36 and 61-63), triggering a re-detection of frames of skeleton data based on confidence scores would be considered normal design actions that the skilled person would do without undue experiment of Sarah. As such, the Sarah reference obviously encompasses triggering a re-detection of the upper body joints of the human subject in the video if the confidence scores fall below a predefined threshold. As per claim 6, Sarah, as modified by Donnell, discloses joint positions of the 3D avatar are scaled to match body proportions of the human subject. See paras. 153-154 of Sarah and 25-26 and 43 of Donnell and the rationale above with respect to the rejections of claim 1 for reason of obviousness. As per claim 7, Sarah, as modified by Donnell, discloses the pre-trained neural network model uses an attention mechanism to focus on keypoints of the upper body joints of the human subject during 3D pose estimation. See paras. 8-9, 21-22 and 153-157 of Sarah. The invention of claim 8 recites features that correspond in scope with the limitations recited claim 1. As the limitations of claim 1 were found obvious over the combined teachings of Sarah and Donnell, it is readily apparent that the applied prior arts perform the underlying elements. As such, the limitations of claim 8 are, therefore, subject to rejections under the same rationale as claim 1. In addition, Sarah discloses a system comprising a memory storing instructions to be executed by one or more processors. See claim 10 of Sarah. Claim 9 is rejected under the same rationale as claim 2. Claim 10 is rejected under the same rationale as claim 3. Claim 12 is rejected under the same rationale as claim 5. Claim 13 is rejected under the same rationale as claim 6. Claim 14 is rejected under the same rationale as claim 7. The subject-matter of independent claim 15 corresponds in terms of a non-transitory computer-readable medium to that of independent method claim 1, and the rationale raised above to reject the later also apply, mutatis mutandis, to the former. In addition, Sarah discloses a system comprising a computer-readable medium storing instructions to be executed by one or more processors. See claim 10 of Sarah. Claim 16 is rejected under the same rationale as claim 2. Claim 17 is rejected under the same rationale as claim 3. Claim 19 is rejected under the same rationale as claim 5. Claim 20 is rejected under the same rationale as claim 6. Allowable Subject Matter 5. Claims 4, 11 and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, because the prior art of record fail to teach the method of claim 1, wherein prior to providing the 2D image as input to the pre-trained neural network model, calibrating the 2D image to account for camera distortions. Conclusion 6. The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. WEISS et al. (US 20250005965) discloses a technique for extraction of human poses from video data for animation of computer models. In some implementations, a computer-implemented method includes determining a first pose sequence of a human body model based on frames of an input video, the frames depicting movement of a person. The first pose sequence includes poses of the human body model that correspond to the video frames. The first pose sequence is updated to produce a second pose sequence, including, over multiple iterations, determining joint velocities of joints of the pose sequence, determining, in parallel, predicted poses from each of multiple poses of the pose sequence by encoding the joint velocities into parameters of a human motion prior, and based on comparing corresponding predicted poses, adjusting joint angles of the pose sequence. The second pose sequence can provide an animation of a computer model corresponding to the movement of the person in the input video. 7. Any inquiry concerning this communication or earlier communications from the examiner should be directed to WESNER SAJOUS whose telephone number is (571) 272-7791. The examiner can normally be reached on M-F 10:00 TO 7:30 (ET). Examiner interviews are available via telephone and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice or email the Examiner directly at wesner.sajous@uspto.gov. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Said Broome can be reached on 571-272-2931. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /WESNER SAJOUS/Primary Examiner, Art Unit 2612 WS 03/18/2026
Read full office action

Prosecution Timeline

Sep 06, 2024
Application Filed
Mar 18, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12597177
Changing Display Rendering Modes based on Multiple Regions
2y 5m to grant Granted Apr 07, 2026
Patent 12597185
METHOD, APPARATUS, AND DEVICE FOR PROCESSING IMAGE, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
Patent 12597203
SIMULATED CONSISTENCY CHECK FOR POINTS OF INTEREST ON THREE-DIMENSIONAL MAPS
2y 5m to grant Granted Apr 07, 2026
Patent 12589303
Computer-Implemented Methods for Generating Level of Detail Assets for Dynamic Rendering During a Videogame Session
2y 5m to grant Granted Mar 31, 2026
Patent 12592038
EDITABLE SEMANTIC MAP WITH VIRTUAL CAMERA FOR MOBILE ROBOT LEARNING
2y 5m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
92%
Grant Probability
99%
With Interview (+7.6%)
2y 5m
Median Time to Grant
Low
PTA Risk
Based on 1196 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month