Last updated: April 19, 2026
Application No. 18/413,867
MACHINE LEARNING MODELS FOR GENERATIVE HUMAN MOTION SIMULATION

Non-Final OA §103
Filed
Jan 16, 2024
Examiner
DEMETER, HILINA K
Art Unit
2617
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
1 (Non-Final)
Interview Optional

— +19.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 659 resolved cases, 2023–2026
Examiner Intelligence

DEMETER, HILINA K View full profile →
Grants 72% — above average
Career Allow Rate
472 granted / 659 resolved
+9.6% vs TC avg
Strong +19% interview lift
Without
With
+19.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
27 currently pending
Career history
686
Total Applications
across all art units
Statute-Specific Performance

§101
8.7%
-31.3% vs TC avg
§103
61.0%
+21.0% vs TC avg
§102
14.5%
-25.5% vs TC avg
§112
6.7%
-33.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 659 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted is considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 4-6, 8-12, 15-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shi et al. (US Publication Number 2023/0005203 A1, hereinafter “Shi”) in view of Chentanez et al. (US Publication Number 2021/0082170 A1, hereinafter “Chentanez”).

(1) regarding claim 1:
As shown in fig. 1, Shi disclosed a system comprising one or more processors to (para. [0036], note that a system described herein (e.g., the dynamic animation generation system) disclosed a machine learning model to generate local phase information based on analyses of motion capture information): 
receive at least one of a text prompt or a kinematic constraint (para. [0043], note that an actor may be placed in a motion capture studio or the dynamic animation generation system may receive data on a real-life soccer game i.e. kinematic constraint); and 
determine first human motion data using a motion model by applying the at least one of the text prompt or the kinematic constraint to the motion model, wherein the motion model is updated by generating, using the motion model (para. [0044], note that the dynamic animation generation system can improve on the quality of human reconstruction by combining human video input with local motion phase information i.e. kinematic constraint. The dynamic animation generation system can first predict rough motion in real life video by applying real life capture data in a first model, such as a neural network, to receive the rough motion), second human motion data by applying motion capture (mocap) data and video reconstruction data as inputs to the motion model (para. [0036], note that a machine learning model to generate local phase information based on analyses of motion capture (mocap) information. The dynamic animation generation system can then process a sliding window of frames to a second machine learning model to generate the next predicted frame and the local motion phase associated with the next predicted frame), receiving user feedback information for the second human motion data (para. [0047], note that the rough motions can include pose data for each frame based on the video. The rough motion data can include a multidimensional signal that includes joint information, such as rotation information for each joint of the human), and updating the motion model based on the user feedback information, wherein the video reconstruction data is generated by reconstructing human motions from a plurality of videos (para. [0047], note that the rough motions can include calculations for each joint and for each frame in the video. The pose data is overlaid on top of the original video input to generate a modified video input).
Shi disclosed most of the subject matter as described as above except for specifically teaching receiving user feedback information; wherein physically implausible artifacts are filtered from the video reconstruction data using a motion imitation controller, wherein the motion imitation controller is updated using at least one of Reinforced Learning (RL) or physics-based character simulations.
However, Chentanez disclosed receiving user feedback information (para. [0028], note that inputs and labeling of inputs are handled internally within the deep reinforcement learning system as it seeks the maximum reward movements); wherein physically implausible artifacts are filtered from the video reconstruction data using a motion imitation controller (para. [0074], note that video generator system 500 can use tracking NN 530 and recovery NN 535 to correct the movements of the target object should the target object position and orientation diverge from the reference object. Recovery NN 535 can be utilized by the recovery agent to provide torque, joint positionings, forces, gain parameters, and other NN physics-based movement parameters to move the target object back into an alignment that will satisfy the stability threshold so that the tracking agent can take over modifying the movement of the target object), wherein the motion imitation controller is updated using at least one of Reinforced Learning (RL) or physics-based character simulations (para. [0075], note that physics simulator 520 can also use the tracking agent i.e. deep reinforcement NN to move the target object differently than the reference object, as long as the movement is within a movement differential parameter. The tracking agent can blend two or more corrective movements received from the tracking NN and utilize the blended resultant to direct the movement of the target object in an action that is not represented by a previously trained MOCAP video clip).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach receiving user feedback information, wherein physically implausible artifacts are filtered from the video reconstruction data using a motion imitation controller, wherein the motion imitation controller is updated using at least one of Reinforced Learning (RL) or physics-based character simulations. The suggestion/motivation for doing so would have been in order to compensate for signal noise in the MOCAP video clip by utilizing the deep reinforcement learning technique to correct for the signal noise and provide additional smoothing of the resulting simulation (para. [0020]). Therefore, it would have been obvious to combine Shi with Chentanez to obtain the invention as specified in claim 1.

(2) regarding claim 2:
Shi further disclosed the system of claim 1, wherein the kinematic constraint comprises at least one of a keyframe of a human character, a path or target trajectory to be followed by the human character, or attributes of one or more body parts or joints of the human character (para. [0041], note that a motion capture studio may be used to learn the realistic gait of an actor as he/she moves about the motion capture studio. Specific portions of the actor, such as joints or bones, may be monitored during this movement), wherein the attributes of the one or more body parts or joints comprise at least one of a position of the one or more body parts or joints (para. [0047], note that the rough motions can include pose data for each frame based on the video. The rough motion data can include a multidimensional signal that includes joint information, such as rotation information for each joint of the human. The rough motions can include calculations for each joint and for each frame in the video. The pose data is overlaid on top of the original video input to generate a modified video input), orientation of the one or more body parts or joints, dimensions of the one or more body parts or joints, rotation of the one or more body parts or joints, velocity of the one or more body parts or joints, acceleration of the one or more body parts or joints, or a spatial relationship between two or more body parts or joints (para. [0057], note that the dynamic animation generation system can apply local motion phase which is determined on the local level by segmenting movements to joints, bones, limbs, ligaments, etc. Thus, the dynamic animation generation system inputs velocity data specific to joints, bones, limbs, ligaments, into a gaiting function to predict next pose data very granularly).

(3) regarding claim 4:
Shi further disclosed the system of claim 1, wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system implemented using a robot; an aerial system; a medical system; a boating system; a smart area monitoring system; a system for performing deep learning operations; a system for performing simulation operations; a system for generating or presenting virtual reality (VR) content, augmented reality (AR) content, or mixed reality (MR) content; a system for performing digital twin operations; a system implemented using an edge device; a system incorporating one or more virtual machines (VMs); a system for generating synthetic data; a system implemented at least partially in a data center; a system for performing conversational artificial intelligence (AI) operations; a system for performing generative Al operations; a system implementing language models; a system implementing large language models (LLMs); a system for hosting one or more real-time streaming applications; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; or a system implemented at least partially using cloud computing resources (para. [0036], note that a system described herein (e.g., the dynamic animation generation system) may implement a machine learning model to generate local phase information based on analyses of motion capture information, note that at least one of the system is disclosed).

(4) regarding claim 5:
As shown in fig. 1, Shi disclosed a system comprising one or more processors to (para. [0036], note that a system described herein (e.g., the dynamic animation generation system) disclosed a machine learning model to generate local phase information based on analyses of motion capture information): 
generate, using a motion model, human motion data by applying motion capture (mocap) data and video reconstruction data as inputs to the motion model (para. [0036], note that a machine learning model to generate local phase information based on analyses of motion capture information. The dynamic animation generation system can then process a sliding window of frames to a second machine learning model to generate the next predicted frame and the local motion phase associated with the next predicted frame), wherein the video reconstruction data is generated by reconstructing human motions from a plurality of videos (para. [0036], note that the dynamic animation generation system can combine local motion phase techniques with human motion reconstruction from captured real life video); 
receive user feedback information for the human motion data (para. [0047], note that the rough motions can include pose data for each frame based on the video. The rough motion data can include a multidimensional signal that includes joint information, such as rotation information for each joint of the human); and update the human motion foundation model based on the user feedback information (para. [0047], note that the rough motions can include calculations for each joint and for each frame in the video. The pose data is overlaid on top of the original video input to generate a modified video input).
Shi disclosed most of the subject matter as described as above except for specifically teaching filter, using a motion imitation controller, physically implausible artifacts from the video reconstruction data, wherein the motion imitation controller is updated using at least one of Reinforced Learning (RL) or physics-based character simulations.
However, Chentanez disclosed filter, using a motion imitation controller, physically implausible artifacts from the video reconstruction data (para. [0074], note that video generator system 500 can use tracking NN 530 and recovery NN 535 to correct the movements of the target object should the target object position and orientation diverge from the reference object. Recovery NN 535 can be utilized by the recovery agent to provide torque, joint positionings, forces, gain parameters, and other NN physics-based movement parameters to move the target object back into an alignment that will satisfy the stability threshold so that the tracking agent can take over modifying the movement of the target object), wherein the motion imitation controller is updated using at least one of Reinforced Learning (RL) or physics-based character simulations (para. [0075], note that physics simulator 520 can also use the tracking agent i.e. deep reinforcement NN to move the target object differently than the reference object, as long as the movement is within a movement differential parameter. The tracking agent can blend two or more corrective movements received from the tracking NN and utilize the blended resultant to direct the movement of the target object in an action that is not represented by a previously trained MOCAP video clip).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach filter, using a motion imitation controller, physically implausible artifacts from the video reconstruction data, wherein the motion imitation controller is updated using at least one of Reinforced Learning (RL) or physics-based character simulations. The suggestion/motivation for doing so would have been in order to compensate for signal noise in the MOCAP video clip by utilizing the deep reinforcement learning technique to correct for the signal noise and provide additional smoothing of the resulting simulation (para. [0020]). Therefore, it would have been obvious to combine Shi with Chentanez to obtain the invention as specified in claim 5.

(5) regarding claim 6:
Shi further disclosed the system of claim 5, wherein the mocap data corresponds to one or more of a pose, a common behavior, a compositional behavior comprising two or more common behaviors that are simultaneous or in sequence, or a domain-specific behavior for an application (para. [0036], note that the system may perform substantially automated analyses of the motion capture information such that complex machine learning labeling processes may be avoided. The dynamic animation generation system can combine local motion phase techniques with human motion reconstruction from captured real life video. While electronic games are described, it may be appreciated that the techniques described herein may be applied generally to movement of character models. For example, animated content (e.g., TV shows, movies) may employ the techniques described herein). 

(6) regarding claim 8:
Shi further disclosed the system of claim 5, wherein the video reconstruction data is unlabeled with any text label (para. [0044], note that the dynamic animation generation system can improve on the quality of human reconstruction by combining human video input with local motion phase information). 

(7) regarding claim 9:
Shi disclosed most of the subject matter as described as above except for specifically teaching wherein the video reconstruction data is determined by applying video data as inputs to one or more pose estimation models.
However, Chentanez disclosed wherein the video reconstruction data is determined by applying video data as inputs to one or more pose estimation models (para. [0035], note that the recovery agent can utilize the reference object from a subsequent frame of the MOCAP video clip as the desired target pose). 
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the video reconstruction data is determined by applying video data as inputs to one or more pose estimation models. The suggestion/motivation for doing so would have been in order to compensate for signal noise in the MOCAP video clip by utilizing the deep reinforcement learning technique to correct for the signal noise and provide additional smoothing of the resulting simulation (para. [0020]). Therefore, it would have been obvious to combine Shi with Chentanez to obtain the invention as specified in claim 9.

(8) regarding claim 10:
Shi further disclosed the system of claim 5, wherein at least one of the human motion data, the video reconstruction data, or the mocap data comprises at least one of a kinematic model, planar model, or volumetric model of a human character (para. [0041], note that movement of these portions may be extracted from image or video data of the actor. This movement may then be translated onto a skeleton or rig for use as an underlying framework of one or more in-game characters i.e. kinematic model).

(9) regarding claim 11:
Shi disclosed most of the subject matter as described as above except for specifically teaching wherein the RL comprises updating the motion imitation controller to generate simulated human motion data that imitates the human motion data.
However, Chentanez disclosed wherein the RL comprises updating the motion imitation controller to generate simulated human motion data that imitates the human motion data (para. [0034], note as long as the simulated character satisfies an imitation threshold, the tracking agent can continue to direct the movements of the character. The tracking agent can utilize the deep reinforcement learning system to provide the gain parameter, torque, and other parameters to the tracking agent to allow the object or character to maintain a close or approximate simulation, i.e., mimicking, of the reference object).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the RL comprises updating the motion imitation controller to generate simulated human motion data that imitates the human motion data. The suggestion/motivation for doing so would have been in order to compensate for signal noise in the MOCAP video clip by utilizing the deep reinforcement learning technique to correct for the signal noise and provide additional smoothing of the resulting simulation (para. [0020]). Therefore, it would have been obvious to combine Shi with Chentanez to obtain the invention as specified in claim 11.

(10) regarding claim 12:
Shi disclosed most of the subject matter as described as above except for specifically teaching wherein the human motion data comprises motion data for a first behavior followed temporally by motion data for a second behavior; and the physics-based character simulations generate motion data for a transition between the first behavior and the second behavior.
However, Chentanez disclosed wherein the human motion data comprises motion data for a first behavior followed temporally by motion data for a second behavior (para. [0028], note that the disclosed method can simulate reference objects sampled from a MOCAP video clip storage with more motions and clips that have not been used during the training of the deep reinforcement learning system); and the physics-based character simulations generate motion data for a transition between the first behavior and the second behavior (para. [0029], note that MOCAP video clips can be selected and combined to make a longer MOCAP video clip. The potential transitions of the reference object between the MOCAP video clips are pre-computed by measuring the root mean square (RMS) positional difference, i.e., error, of the center of mass of the target object in the last frame against the frames generated from the tracking NN parameters).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the human motion data comprises motion data for a first behavior followed temporally by motion data for a second behavior; and the physics-based character simulations generate motion data for a transition between the first behavior and the second behavior. The suggestion/motivation for doing so would have been in order to compensate for signal noise in the MOCAP video clip by utilizing the deep reinforcement learning technique to correct for the signal noise and provide additional smoothing of the resulting simulation (para. [0020]). Therefore, it would have been obvious to combine Shi with Chentanez to obtain the invention as specified in claim 12.

(11) regarding claim 15:
Shi further disclosed the system of claim 5, wherein the user feedback information comprises at least one of labels or text descriptions for the human motion data that describe types of the human motion data or artifacts in the human motion data (para. [0056], note that the machine learning model may then be used to generate animation for an in-game character which is based on the motion capture information, creating rough motion data. Since these machine learning models may directly output motion data for use in a second neural network (using a sliding window, as described further below) that can generate motion data for animating an in-game character automatically).

(12) regarding claim 16:
Shi further disclosed the system of claim 5, wherein the user feedback information comprises user input to correct artifacts in the human motion data or the video reconstruction data (para. [0044], note that the dynamic animation generation system can improve on the quality of human reconstruction by combining human video input with local motion phase information. The dynamic animation generation system can first predict rough motion in real life video by applying real life capture data in a first model, such as a neural network, to receive the rough motion).

(13) regarding claim 17:
Shi further disclosed the system of claim 5, wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system implemented using a robot; an aerial system; a medical system; a boating system; a smart area monitoring system; a system for performing deep learning operations; a system for performing simulation operations; a system for generating or presenting virtual reality (VR) content, augmented reality (AR) content, or mixed reality (MR) content; a system for performing digital twin operations; a system implemented using an edge device; a system incorporating one or more virtual machines (VMs); a system for generating synthetic data; a system implemented at least partially in a data center; a system for performing conversational artificial intelligence (AI) operations; a system for performing generative Al operations; a system implementing language models; a system implementing large language models (LLMs); a system for hosting one or more real-time streaming applications; a system for performing light transport simulation; a system for performing collaborative content creation for 3D assets; or a system implemented at least partially using cloud computing resources (para. [0036], note that a system described herein (e.g., the dynamic animation generation system) may implement a machine learning model to generate local phase information based on analyses of motion capture information, note that at least one of the system is disclosed).

 (14) regarding claim 18:
As shown in fig. 3B, Shi disclosed a method (para. [0007], note that a computer-implemented method for dynamically generating animation of characters from real life motion capture video), comprising: 
generating, using a motion model, human motion data by applying motion capture (mocap) data (para. [0036], note that a machine learning model to generate local phase information based on analyses of motion capture information. The dynamic animation generation system can then process a sliding window of frames to a second machine learning model to generate the next predicted frame and the local motion phase associated with the next predicted frame) and video reconstruction data as inputs to the motion model, wherein the video reconstruction data is generated by reconstructing human motions from a plurality of videos (para. [0036], note that the dynamic animation generation system can combine local motion phase techniques with human motion reconstruction from captured real life video); 
receiving user feedback information for the human motion data (para. [0047], note that the rough motions can include pose data for each frame based on the video. The rough motion data can include a multidimensional signal that includes joint information, such as rotation information for each joint of the human); and 
updating the motion model based on the user feedback information (para. [0047], note that the rough motions can include calculations for each joint and for each frame in the video. The pose data is overlaid on top of the original video input to generate a modified video input).
Shi disclosed most of the subject matter as described as above except for specifically teaching receiving user feedback information.
However, Chentanez disclosed receiving user feedback information (para. [0028], note that inputs and labeling of inputs are handled internally within the deep reinforcement learning system as it seeks the maximum reward movements).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach receiving user feedback information. The suggestion/motivation for doing so would have been in order to compensate for signal noise in the MOCAP video clip by utilizing the deep reinforcement learning technique to correct for the signal noise and provide additional smoothing of the resulting simulation (para. [0020]). Therefore, it would have been obvious to combine Shi with Chentanez to obtain the invention as specified in claim 18.

(13) regarding claim 19:
Shi further disclosed the method of claim 18, wherein the user feedback information comprises at least one of: a score that rates relevance of the human motion data to a text prompt; user input to correct artifacts in the human motion data or the video reconstruction data; or labels or text descriptions for the human motion data that describe types of the human motion data or artifacts in the human motion data (para. [0043], note that the actor may then perform different movements, and movement of different portions of the actor (e.g., joints) may be stored by a system. Additionally, contact with an external environment may be recorded. Thus, the specific foot fall pattern used by an upper echelon boxer or basketball player may be recorded. Additionally, the specific contact made by an actor's hands with respect to a basketball, football, and so on, may be recorded. This recorded information may be used to increase a realism associated with animation generation. In some embodiments, motion can be generated for biped and/or human characters).

Claim(s) 3, 7 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Shi and Chentanez, further in view of Menapace et al. (US Publication Number 11,113,861 B2).

(1) regarding claim 3:
Shi disclosed most of the subject matter as described as above except for specifically teaching wherein the motion model comprises a first model and a second model; at least one of the first model or the second model is a diffusion model; the first model is to generate a global root motion; the second model is to generate a local joint motion; and the first human motion data comprises the global root motion and the local joint motion.
However, Menapace disclosed wherein the motion model comprises a first model and a second model (para. [0030], note that there are models to generate human motion sequences from text. MotionCLIP aligns the space of human motions to the one of a pretrained Contrastive Language-Image Pre-training (CLIP) model); at least one of the first model or the second model is a diffusion model; the first model is to generate a global root motion (para. [0030], note that Diffusion models have shown strong performance on this task whereby sequences of human poses are generated by a diffusion model conditioned on the output of a frozen CLIP text encoder); the second model is to generate a local joint motion; and the first human motion data comprises the global root motion and the local joint motion (para. [0059], note that 0059] Since the radiance field C alone supports only rendering of rigid objects expressed in a canonical space, to render articulated objects such as humans, a deformation model D 250 (FIG. 2C) is introduced that implements a deformation procedure based on linear blend skinning (LBS). Given an articulated object, it is assumed that its kinematic tree is known and that the transformation from each joint j to the parent joint is part of the object's properties). 
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the motion model comprises a first model and a second model; at least one of the first model or the second model is a diffusion model; the first model is to generate a global root motion; the second model is to generate a local joint motion; and the first human motion data comprises the global root motion and the local joint motion. The suggestion/motivation for doing so would have been in order to generate a Learnable Game Engine (LGE) that maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint (abs.). Therefore, it would have been obvious to combine Shi and Chentanez with Menapace to obtain the invention as specified in claim 12.

(2) regarding claim 7:
Shi disclosed most of the subject matter as described as above except for specifically teaching wherein the video reconstruction data has a text label or description. 
However, Menapace disclosed wherein the video reconstruction data has a text label or description (para. [0082], note that diffusion models have recently shown state-of-the-art performance on several tasks such as text-conditioned image and video generation, sequence modeling, and text-conditioned human motion generation).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the video reconstruction data has a text label or description. The suggestion/motivation for doing so would have been in order to generate a Learnable Game Engine (LGE) that maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint (abs.). Therefore, it would have been obvious to combine Shi and Chentanez with Menapace to obtain the invention as specified in claim 7.

(3) regarding claim 13:
Shi disclosed most of the subject matter as described as above except for specifically teaching wherein the user feedback information comprises a score that rates relevance of the human motion data to a text prompt.
However, Menapace disclosed wherein the user feedback information comprises a score that rates relevance of the human motion data to a text prompt (para. [0028], note that examples of generic diffusion modeling frameworks include denoising diffusion probabilistic models (DDPM), noise conditioned score networks, and stochastic differential equations. Following this methodological direction, a score-based diffusion model has been introduced for imputing missing values in time series).
At the time of filing for the invention, it would have been obvious to a person of ordinary skilled in the art to teach wherein the user feedback information comprises a score that rates relevance of the human motion data to a text prompt. The suggestion/motivation for doing so would have been in order to generate a Learnable Game Engine (LGE) that maintains states of the scene, objects and agents in it, and enables rendering the environment from a controllable viewpoint (abs.). Therefore, it would have been obvious to combine Shi and Chentanez with Menapace to obtain the invention as specified in claim 7.

Allowable Subject Matter
Claims 14 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
The following is a statement of reasons for the indication of allowable subject matter:  the prior arts made of record do not teach “wherein the human motion data comprises a plurality of candidate generated motions; the user feedback information comprises a candidate generated motion of the plurality of candidate generated motions selected by a user or a ranking of the plurality of candidate generated motions determined by the user; and the motion model is updated using a ranking loss corresponding to the selected candidate generated motion or the ranking”, as recited in claims 14 and 20.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.

Yang et al. (US Publication Number 2007/0104351 A1) disclosed a method and system for efficiently and accurately tracking three-dimensional (3D) human motion from a two-dimensional (2D) video sequence, even when self-occlusion, motion blur and large limb movements occur.

Corazza et al. (US Patent Number 8,180,174 B2) disclosed an automated method for the generation of (i) human models comprehensive of shape and joint centers information and/or (ii) subject specific models from multiple video streams is provided. 

De Aguiar et al. (US Publication Number 2014/0160116 A1) disclosed systems and methods are described for animating 3D characters using synthetic motion data generated by motion models in response to a high level description of a desired sequence of motion provided by an animator.

Any inquiry concerning this communication or earlier communication from the examiner should be directed to Hilina K Demeter whose telephone number is (571) 270-1676. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, King Y. Poon could be reached at (571) 270- 0728. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about PAIR system, see http://pari-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HILINA K DEMETER/Primary Examiner, Art Unit 2617
Read full office action
Prosecution Timeline

Jan 16, 2024
Application Filed
Jan 24, 2026
Non-Final Rejection — §103
Apr 06, 2026
Applicant Interview (Telephonic)
Apr 06, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/083,474
Patent 12602864
EVENT ROUTING IN 3D GRAPHICAL ENVIRONMENTS
2y 5m to grant Granted Apr 14, 2026
18/378,049
Patent 12592042
SYSTEMS AND METHODS FOR MAINTAINING SECURITY OF VIRTUAL OBJECTS IN A DISTRIBUTED NETWORK
2y 5m to grant Granted Mar 31, 2026
17/966,363
Patent 12586297
INTERACTIVE IMAGE GENERATION
2y 5m to grant Granted Mar 24, 2026
18/331,906
Patent 12579724
EXPRESSION GENERATION METHOD AND APPARATUS, DEVICE, AND MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/154,219
Patent 12561906
METHOD FOR GENERATING AT LEAST ONE GROUND TRUTH FROM A BIRD'S EYE VIEW
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
72%
Grant Probability
91%
With Interview (+19.4%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 659 resolved cases by this examiner. Grant probability derived from career allow rate.