Last updated: April 19, 2026
Application No. 18/413,967
MACHINE LEARNING MODELS FOR GENERATIVE HUMAN MOTION SIMULATION

Final Rejection §103
Filed
Jan 16, 2024
Examiner
TRUONG, KARL DUC
Art Unit
2614
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
2 (Final)
Interview Optional

— +31.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 29 resolved cases, 2023–2026
Examiner Intelligence

TRUONG, KARL DUC View full profile →
Grants 52% of resolved cases
Career Allow Rate
15 granted / 29 resolved
-10.3% vs TC avg
Strong +31% interview lift
Without
With
+31.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
45 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
3.2%
-36.8% vs TC avg
§103
85.3%
+45.3% vs TC avg
§102
9.5%
-30.5% vs TC avg
§112
2.1%
-37.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 29 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is in response to the amendment filed on 3rd December, 2025. Claims 1, 17, and 20 have been amended. Claims 1-20 remain rejected in the application.

Response to Arguments
Applicant's arguments with respect to Claims 1, 17, and 20 filed on 3rd December, 2025, with respect to the rejection under 35 U.S.C. § 103, regarding that the prior art does not teach the limitation(s): "generating the human motion data comprises, for each iteration of one or more interleaved diffusion iterations: determining, using the first model in each iteration of the one or more interleaved diffusion iterations, global root motion by applying noisy global root motion and noisy local joint motion as inputs into the first model" and "determining, using the second model in each iteration of the interleaved diffusion, local joint motion by applying the noisy local joint motion and local root motion as inputs into the second model" have been fully considered, but are moot because of new grounds for rejection. It has now been taught by the combination of Shafir and Oreshkin.

Regarding arguments to Claims 2-16 and 18-19, they directly/indirectly depend on independent Claims 1, 17, and 20 respectively. Applicant does not argue anything other than independent Claims 1, 17, and 20. The limitations in those claims, in conjunction with combination, was previously established as explained.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-5, 7-10, 16-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shafir et al. ("Human Motion Diffusion as a Generative Prior"), hereinafter referenced as Shafir, in view of Oreshkin et al. (US 20240054671 A1, previously cited), hereinafter referenced as Oreshkin.

Regarding Claim 1, Shafir discloses a system comprising one or more processors to (Shafir, [Section 4.2 Two-Person Generation]: teaches using an NVIDIA RTX 2080 TI GPU <read on processors>; Note: it should be noted that in order to use a GPU, computer hardware must be used, which is being interpreted as a system):
receive at least one of a text prompt or a kinematic constraint (Shafir, [Section 3.1 Long Sequences Generation]: teaches receiving a text prompt for a "handshake" motion);
generate, by a motion model comprising a first model and a second model, human motion data of a human character by applying a random noise and the at least one of the text prompt or the kinematic constraint into the motion model (Shafir, FIG. 4 teaches two static MDMs <read on first model> and a Com-MDM <read on second model>; [Section 3 Method]: teaches sequential composition with a Double-Take method, "which generalizes MDM to generate motions <read on human motion data of human character> of arbitrary length without further training, through sequential composition"; [Section 3 Method]: further teaches MDMs being an iterative denoising diffusion model, where it assumes T noising steps modeled by a stochastic process <read on applying random noise>; [Section 3.1 Long Sequences Generation]: teaches the Double-Take method generating a sample motion in the first take based on a text prompt), wherein
    PNG
    media_image1.png
    228
    324
    media_image1.png
    Greyscale

generating the human motion data comprises, for each iteration of one or more interleaved diffusion iterations (Shafir, [Section 3 Method]: teaches sampling a novel motion from MDM is done in an iterative manner, which is done via sequential composition <read on interleaved diffusion iteration> with the Double-Take method; [Section 3.3 Fine-Tuned Motion Control]: teaches generating "full-body motion <read on generating human motion data> controlled by a user-defined set of input features"):
determining, using the first model in each iteration of the one or more interleaved diffusion iterations, global root motion [[by applying noisy global root motion and noisy local joint motion as inputs into the first model]] (Shafir, [Section 3 Method]: teaches using an MDM <read on first model> to learn new generative tasks, such as human motion, where it learns joint rotations and global positions based on its training dataset; [Section 3.2 Two-Person Generation]: teaches the motions being processed are started with the root at the origin <read on determining global root motion>); and
determining, using the second model in each iteration of the interleaved diffusion, local joint motion [[by applying the noisy local joint motion and local root motion as inputs into the second model]] (Shafir, [Section 3.2 Two-Person Generation]: teaches using Com-MDM <read on using second model> taking input of the initial poses of each person as part of the diffusion process to fine-tune the motion, such as joint control tasks <read on determining local joint motion>; [Section 4.3 Fine-Tuned Motion Control]: teaches "in the joint control tasks, we take the relative location of the joint with respect to the root location"), wherein
[[the local root motion is determined based on the global root motion, wherein]]
the human motion data comprises the local joint motion and the global root motion (Shafir, [Section 3.3 Fine-Tuned Motion Control]: teaches generating "full-body motion controlled by a user-defined set of input features," where "these features can be root trajectory <read on global root motion>, a single joint <read on local joint motion>, or any combination of them").

However, Shafir does not expressly disclose
determining, using the first model in each iteration of the one or more interleaved diffusion iterations, global root motion by applying noisy global root motion and noisy local joint motion as inputs into the first model; and
determining, using the second model in each iteration of the interleaved diffusion, local joint motion by applying the noisy local joint motion and local root motion as inputs into the second model, wherein
the local root motion is determined based on the global root motion.

Oreshkin discloses
determining, using the first model in each iteration of the one or more interleaved diffusion iterations, global root motion by applying noisy global root motion and noisy local joint motion as inputs into the first model (Oreshkin, [0073]: teaches using data augmentation to artificially augment a size of an input training set <read on inputs into first model>, which includes adding random translations and rotations to parts of the character body <read on noisy local joint motion> and the natural position of the character body in world space <read on noisy global root motion>); and
determining, using the second model in each iteration of the interleaved diffusion, local joint motion by applying the noisy local joint motion and local root motion as inputs into the second model (Oreshkin, [0073]: teaches using data augmentation to artificially augment a size of an input training set <read on inputs into second model>, which includes adding random translations and rotations to parts of the character body <read on noisy local joint motion>, such as the hands, and the natural position of the character body in world space <read on noisy global root motion>; Note: it should be noted that it is common in the art to apply noise to input data of a diffusion model), wherein
the local root motion is determined based on the global root motion (Oreshkin, [0052]: teaches an inverse kinematics decoder module 168 generating local joint rotations 176 <read on local root motion> based on positions defined in global space).



Oreshkin is analogous art with respect to Shafir because they are from the same field of endeavor, namely processing human motion data via a neural network. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement data augmentation that applies random noise, such as stochastic translation and rotation, to character poses as taught by Oreshkin into the teaching of Shafir. The suggestion for doing so would allow the fine-tuned diffusion model to generate more accurate and desirable motion output that can include motion interaction between characters, thereby yielding improved results. Therefore, it would have been obvious to combine Oreshkin with Shafir.

Regarding Claim 17, it recited the limitations that are similar in scope to Claim 1. As shown in the rejection, the combination of Shafir and Oreshkin discloses the limitations of Claim 1. Additionally, Shafir discloses a system comprising one or more processors to (Shafir, [Section 4.2 Two-Person Generation]: teaches using an NVIDIA RTX 2080 TI GPU <read on processors>):
perform at least one iteration of an interleaved diffusion process (Shafir, [Section 3 Method]: teaches sampling a novel motion from MDM is done in an iterative manner, which is done via sequential composition <read on interleaved diffusion iteration> with the Double-Take method), wherein
at least one iteration of the plurality of iterations comprises (Shafir, [Section 3 Method]: teaches sampling a novel motion from MDM is done in an iterative manner):…

Thus, Claim 17 is met by Shafir according to the mapping presented in the rejection of Claim 1.

Regarding Claim 20, it recites the limitations that are similar in scope to Claim 1, but in a method. As shown in the rejection, the combination of Shafir and Oreshkin discloses the limitations of Claim 1. Additionally, Shafir discloses a method (Shafir, [Section 3 Method]: teaches a Double-Take method), comprising:…

Thus, Claim 20 is met by Shafir according to the mapping presented in the rejection of Claim 1, given the system corresponds to a method.

Regarding Claim 2, the combination of Shafir and Oreshkin discloses the system of Claim 1. Additionally, Shafir further discloses wherein the kinematic constraint comprises at least one of:
a keyframe of a human character, a path or target trajectory to be followed by the human character, or attributes of one or more body parts or joints of the human character (Shafir, [Section 4.3 Fine-Tuned Motion Control]: teaches performing composite tasks, such as left wrist + trajectory and left wrist + right foot <read on body part attributes>), wherein
the attributes of the one or more body parts or joints comprise at least one of a position of the one or more body parts or joints, orientation of the one or more body parts or joints, dimensions of the one or more body parts or joints, rotation of the one or more body parts or joints, velocity of the one or more body parts or joints, acceleration of the one or more body parts or joints, or a spatial relationship between two or more body parts or joints (Shafir, [Section 4.3 Fine-Tuned Motion Control]: teaches performing composite tasks, such as left wrist + trajectory and left wrist + right foot <read on body part attributes>, where joint control tasks take the relative location <read on position> of the joint with respect to the root location).

Regarding Claim 3, the combination of Shafir and Oreshkin discloses the system of Claim 1. Shafir does not expressly disclose the limitations of Claim 3; however, Oreshkin discloses wherein the global root motion is defined by at least one of
a global position of the human character (Oreshkin, [0056]: teaches a global root position 165 being data that describes "a center of coordinates for the skeleton <read on human character>") and
global heading of the human character (Oreshkin, [0032]: teaches the look-at effector providing "an ability to maintain a global orientation of a joint towards a particular global position in a scene (for example, forcing a head of a character to look at a given object or point in a space)"; the joint, such as the head of a character, being forced to look at a global point is being interpreted as a global heading; Note: it should be noted that a "heading" in the art is defined as the direction an object is pointed at).

Oreshkin is analogous art with respect to Shafir because they are from the same field of endeavor, namely processing human motion data via a neural network. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement data augmentation that applies random noise, such as stochastic translation and rotation, to character poses as taught by Oreshkin into the teaching of Shafir. The suggestion for doing so would allow the fine-tuned diffusion model to generate more accurate and desirable motion output that can include motion interaction between characters, thereby yielding improved results. Therefore, it would have been obvious to combine Oreshkin with Shafir.

Regarding Claim 4, the combination of Shafir and Oreshkin discloses the system of Claim 1. Additionally, Shafir further discloses wherein the local joint motion is defined by at least one of
a position of a joint on the human character, a velocity of the joint of the human character, a rotation of the joint of the human character, or a local foot contact of the human character (Shafir, [Section 3 Method]: teaches a joint position of a human model).

Regarding Claim 5, the combination of Shafir and Oreshkin discloses the system of Claim 1. Additionally, Shafir further discloses wherein the local root motion is defined by at least one of
a one-dimensional velocity, a linear velocity, or a height of the human character (Shafir, [Section 4.3 Fine-Tuned Motion Control]: teaches the diffusion model considering the trajectory to be the angle of the character on the                                 
                                    x
                                    z
                                
                             plane its linear velocities in that plane).

Regarding Claim 7, the combination of Shafir and Oreshkin discloses the system of Claim 1. Additionally, Shafir further discloses wherein
a value corresponding to the text prompt is set as a parameter [[in at least one of the noisy global root motion or the noisy local joint motion]] (Shafir, [Section 3.1 Long Sequences Generation]: teaches a text prompt for a "handshake", where the handshake is defined by value r <read on parameter>).




However, Shafir does not expressly disclose
a value corresponding to the text prompt is set as a parameter in at least one of the noisy global root motion or the noisy local joint motion.

Oreshkin discloses
a value corresponding to the text prompt is set as a parameter in at least one of the noisy global root motion or the noisy local joint motion (Oreshkin, [0049]: teaches an inverse kinematics decoder 168 that predicts internal geometric parameters (e.g., local rotation angles or joint rotations 176 <read on noisy local joint motion>) of the skeleton kinematic system).

Oreshkin is analogous art with respect to Shafir because they are from the same field of endeavor, namely processing human motion data via a neural network. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement data augmentation that applies random translation and rotation to character poses as taught by Oreshkin into the teaching of Shafir. The suggestion for doing so would generate a plurality of possible candidate motion data, thereby yielding predictable results. Therefore, it would have been obvious to combine Oreshkin with Shafir.

Regarding Claim 8, the combination of Shafir and Oreshkin discloses the system of Claim 1. Shafir does not expressly disclose the limitations of Claim 8; however, Oreshkin discloses wherein
a value corresponding to the kinematic constraint is set as a parameter in at least one of the noisy global root motion or the noisy local joint motion (Oreshkin, [0049]: teaches an inverse kinematics decoder 168 that predicts internal geometric parameters <read on value corresponding to kinematic constraint> (e.g., local rotation angles or joint rotations 176 <read on noisy local joint motion>) of the skeleton kinematic system).

Oreshkin is analogous art with respect to Shafir because they are from the same field of endeavor, namely processing human motion data via a neural network. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement data augmentation that applies random translation and rotation to character poses as taught by Oreshkin into the teaching of Shafir. The suggestion for doing so would generate a plurality of possible candidate motion data, thereby yielding predictable results. Therefore, it would have been obvious to combine Oreshkin with Shafir.

Regarding Claim 9, the combination of Shafir and Oreshkin discloses the system of Claim 1. Shafir does not expressly disclose the limitations of Claim 9; however, Oreshkin discloses wherein the random noise is used to
generate the noisy global root motion and the noisy local joint motion (Oreshkin, [0073]: teaches applying random translations and rotations to a plurality of input poses <read on noisy global root motion>, where each human character includes input effectors that describe the position for the hands and feet <read on noisy local joint motion>).

Oreshkin is analogous art with respect to Shafir because they are from the same field of endeavor, namely processing human motion data via a neural network. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement data augmentation that applies random translation and rotation to character poses as taught by Oreshkin into the teaching of Shafir. The suggestion for doing so would generate a plurality of possible candidate motion data, thereby yielding predictable results. Therefore, it would have been obvious to combine Oreshkin with Shafir.

Regarding Claim 10, the combination of Shafir and Oreshkin discloses the system of Claim 1. Additionally, Shafir further discloses wherein
the first model comprises a first diffusion model (Shafir, FIG. 4 teaches using two fixed MDMs), and
    PNG
    media_image2.png
    228
    324
    media_image2.png
    Greyscale

the second model comprises a second diffusion model (Shafir, FIG. 4 teaches using a Com-MDM).
    PNG
    media_image3.png
    228
    324
    media_image3.png
    Greyscale


Regarding Claim 16, the combination of Shafir and Oreshkin discloses the system of Claim 1. Additionally, Shafir further discloses wherein the system is comprised in at least one of:
a control system for an autonomous or semi-autonomous machine;a perception system for an autonomous or semi-autonomous machine;a system implemented using a robot;an aerial system;a medical system;a boating system;a smart area monitoring system;a system for performing deep learning operations;a system for performing simulation operations;a system for generating or presenting virtual reality (VR) content, augmented reality (AR) content, or mixed reality (MR) content;a system for performing digital twin operations;a system implemented using an edge device;a system incorporating one or more virtual machines (VMs);a system for generating synthetic data;a system implemented at least partially in a data center;a system for performing conversational artificial intelligence (AI) operations;a system for performing generative AI operations;a system implementing language models;a system implementing large language models (LLMs);a system for hosting one or more real-time streaming applications;a system for performing light transport simulation;a system for performing collaborative content creation for 3D assets; ora system implemented at least partially using cloud computing resources (Shafir, [Section 3.3 Fine-Tuned Motion Control]: teaches generating "full-body motion <read on generating synthetic data> controlled by a user-defined set of input features").

Regarding Claim 18, the combination of Shafir and Oreshkin discloses the system of Claim 17. Shafir does not expressly disclose the limitations of Claim 18; however, Oreshkin discloses wherein
the global root motion is defined by at least one of a global position of the human character and global heading of the human character (Oreshkin, [0056]: teaches a global root position 165 being data that describes "a center of coordinates for the skeleton <read on human character>"; [0032]: teaches the look-at effector providing "an ability to maintain a global orientation of a joint towards a particular global position in a scene (for example, forcing a head of a character to look at a given object or point in a space)"; the joint, such as the head of a character, being forced to look at a global point is being interpreted as a global heading; Note: it should be noted that a "heading" in the art is defined as the direction an object is pointed at);
the local joint motion is defined by at least one of a position of a joint on the human character, a velocity of the joint of the human character, a rotation of the joint of the human character, or a local foot contact of the human character (Oreshkin, [0030]: teaches a joint effector, which is "a subtype of a positional effector that represents a position of a joint for a character (e.g., such as a desired position for a left foot of bipedal character)");
the local root motion is defined by at least one of a one-dimensional velocity, a linear velocity, or a height of the human character (Oreshkin, [0033]: teaches a rotational effector including local directional data, such as a direction vector or an amount and direction of rotation that specifies a gaze direction, a running velocity <read on linear velocity>, a hand orientation, and the like).

Oreshkin is analogous art with respect to Shafir because they are from the same field of endeavor, namely processing human motion data via a neural network. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement data augmentation that applies random translation and rotation to character poses as taught by Oreshkin into the teaching of Shafir. The suggestion for doing so would generate a plurality of possible candidate motion data, thereby yielding predictable results. Therefore, it would have been obvious to combine Oreshkin with Shafir.

Claims 6 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shafir et al. ("Human Motion Diffusion as a Generative Prior"), hereinafter referenced as Shafir, in view of Oreshkin et al. (US 20240054671 A1, previously cited), hereinafter referenced as Oreshkin as applied to Claims 1 and 17 above respectively, and further in view of Villegas et al. (US 20220020199 A1, previously cited), hereinafter referenced as Villegas.

Regarding Claims 6 and 19, the combination of Shafir and Oreshkin discloses the systems of Claims 1 and 17 respectively. The combination of Shafir and Oreshkin does not expressly disclose the limitations of Claims 6 and 19; however, Villegas discloses wherein the one or more processors to
determine the local root motion based on the global root motion by transforming the global root motion to the local root motion according to mapping between a global coordinate frame to a local coordinate frame (Villegas, [0054]: teaches applying kinematic constraints to a retargeting motion from character A to character B, where the positioning module 114 combines local pose data <read on global root motion> and root motion data <read on local root motion> and uses this data to position the character B into a visual space <read on mapping between global coordinate frame to local coordinate frame>), wherein
the global root motion is defined in the global coordinate frame (Villegas, [0059]: teaches a loss function that indicates whether a joint is in contact with a surface of a visual space (e.g., a ground surface) <read on global coordinate frame>), and
the local root motion is defined in the local coordinate frame (Villegas, [0055]: teaches a set of character joint locations in coordinates of a visual space <read on local coordinate frame>).

Villegas is analogous art with respect to Shafir, in view of Oreshkin because they are from the same field of endeavor, namely processing motion data. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to retarget pose data between visual spaces as taught by Villegas into the teaching of Shafir, in view of Oreshkin. The suggestion for doing so would allow for accurate one-to-one pose transfer whilst maintaining quality and consistency between models. Therefore, it would have been obvious to combine Villegas with Shafir, in view of Oreshkin.

Claims 11-12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shafir et al. ("Human Motion Diffusion as a Generative Prior"), hereinafter referenced as Shafir, in view of Oreshkin et al. (US 20240054671 A1, previously cited), hereinafter referenced as Oreshkin as applied to Claim 1 above respectively, and further in view of Amer et al. (US 20190304104 A1, previously cited), hereinafter referenced as Amer.

Regarding Claim 11, the combination of Shafir and Oreshkin discloses the system of Claim 1. The combination of Shafir and Oreshkin does not expressly disclose the limitations of Claim 11; however, Amer discloses wherein
the motion model is updated by applying motion capture (mocap) data and video reconstruction data as constraints to the motion model to generate human motion data (Amer, [0012]: teaches using a GAN discriminator to more accurately determine whether a particular video or animation is a sample from a real <read on mocap data constraint> or generated <read on video reconstruction data constraint> distribution; Note: it should be noted that although not expressly stated, it is common in the art to use training data to further train a GAN discriminator when the model can still differentiate between real and synthesized data; in addition, a GAN model is a type of generative model, such as a diffusion model; [0044]: teaches a machine learning module 109 updating data structure 151 <read on generating motion data> to incorporate additional feedback information by the user through the user interaction module 108), and
the motion model is updated using user feedback information for the human motion data (Amer, [0044]: teaches a machine learning module 109 updating data structure 151 <read on motion model> to incorporate additional feedback information by the user through the user interaction module 108).

Amer is analogous art with respect to Shafir, in view of Oreshkin because they are from the same field of endeavor, namely processing captured motion data using neural networks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement a GAN discriminator to analyze input motion data as taught by Amer into the teaching of Shafir, in view of Oreshkin. The suggestion for doing so would allow the system to determine whether the input data is real or synthesized, thereby automating the training process that can be added during diffusion training. Therefore, it would have been obvious to combine Amer with Shafir, in view of Oreshkin.

Regarding Claim 12, the combination of Shafir, Oreshkin, and Amer discloses the system of Claim 11. The combination of Shafir and Oreshkin does not expressly disclose the limitations of Claim 12; however, Amer discloses wherein the user feedback information comprises
a score that rates relevance of the human motion data to a text prompt (Amer, [0168]: teaches each of the display elements may indicate information about the relevance of each such search result video 712 <read on human motion data> to the search query, where such information may be conveyed by a score, a color, or a confidence bar).

Amer is analogous art with respect to Shafir, in view of Oreshkin because they are from the same field of endeavor, namely processing captured motion data using neural networks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement a GAN discriminator to analyze input motion data as taught by Amer into the teaching of Shafir, in view of Oreshkin. The suggestion for doing so would allow the system to determine whether the input data is real or synthesized, thereby automating the training process that can be added during diffusion training. Therefore, it would have been obvious to combine Amer with Shafir, in view of Oreshkin.




Regarding Claim 15, the combination of Shafir, Oreshkin, and Amer discloses the system of Claim 11. The combination of Shafir and Oreshkin does not expressly disclose the limitations of Claim 15; however, Amer discloses wherein the user feedback information comprises
user input to correct artifacts in the human motion data or the video reconstruction data (Amer, [0154]: teaches XAI system providing "end users with an explanation of individual decisions, enable users to understand the system's overall strengths and weaknesses, convey an understanding of how the system will behave in the future, and perhaps how to correct the system's mistakes <read on user input>"; [0055]: teaches motion artifacts <read on human motion data> generated by the neural network).

Amer is analogous art with respect to Shafir, in view of Oreshkin because they are from the same field of endeavor, namely processing captured motion data using neural networks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to implement a GAN discriminator to analyze input motion data as taught by Amer into the teaching of Shafir, in view of Oreshkin. The suggestion for doing so would allow the system to determine whether the input data is real or synthesized, thereby automating the training process that can be added during diffusion training. Therefore, it would have been obvious to combine Amer with Shafir, in view of Oreshkin.





Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Shafir et al. ("Human Motion Diffusion as a Generative Prior"), hereinafter referenced as Shafir, in view of Oreshkin et al. (US 20240054671 A1, previously cited), hereinafter referenced as Oreshkin, and further in view of Amer et al. (US 20190304104 A1, previously cited), hereinafter referenced as Amer as applied to Claim 11 above respectively, and further in view of Protter et al. (US 20200226357 A1, previously cited), hereinafter referenced as Protter.

Regarding Claim 13, the combination of Shafir, Oreshkin, and Amer discloses the system of Claim 11. Additionally, Shafir further discloses wherein the human motion data comprises
a plurality of candidate generated motions (Shafir, [Section 4.3 Fine-Tuned Motion Control]: teaches generating motions <read on candidate generated motions> with a fine-tuned model that was trained for a specific control task);
[[the user feedback information comprises a candidate generated motion of the plurality of candidate generated motions selected by a user or a ranking of the plurality of candidate generated motions determined by the user; and]]
[[the motion model is updated using a ranking loss corresponding to the selected candidate generated motion or the ranking.]]

However, the combination of Shafir, Oreshkin, and Amer does not expressly disclose
the user feedback information comprises a candidate generated motion of the plurality of candidate generated motions selected by a user or a ranking of the plurality of candidate generated motions determined by the user; and
the motion model is updated using a ranking loss corresponding to the selected candidate generated motion or the ranking.

Protter discloses
the user feedback information comprises a candidate generated motion of the plurality of candidate generated motions selected by a user or a ranking of the plurality of candidate generated motions determined by the user (Protter, [0060]: teaches generating updated motion estimates based on a non-uniform probability of movements in the human movement model 150, e.g., tailored per human-specific movement patterns, where "a plurality of most likely current or future candidate movements may be ordered <read on user ranking> based on their likelihoods and/or may be associated with confidence values"; [0075]: teaches the groups of movements being created by supervised learning of a movement language <read on user-selected candidate generated motions based on user feedback information>; Note: it should be noted that although not expressly stated, it is common in the art for supervised training to involve human/user intervention to select high quality data for the neural network to learn from); and
the motion model is updated using a ranking loss corresponding to the selected candidate generated motion or the ranking (Protter, [0059]: teaches movement recordings 20 being sent to analysis module 140 to update and train human movement model 150 <read on updating motion model>; [0060]: teaches generating updated motion estimates based on a non-uniform probability of movements in model 150, where "a plurality of most likely current or future candidate movements may be ordered <read on ranking loss> based on their likelihoods and/or may be associated with confidence values").


Protter is analogous art with respect to the combination of Shafir, Oreshkin, and Amer because they are from the same field of endeavor, namely processing human motion data via neural networks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to generate updated motion estimates and estimate future candidate movements as taught by Protter into the combined teaching of Shafir, Oreshkin, and Amer. The suggestion for doing so would allow the system to discard impossible motion estimates, thereby improving quality of synthesized motion estimates. Therefore, it would have been obvious to combine Protter with the combination of Shafir, Oreshkin, and Amer.

Regarding Claim 14, the combination of Shafir, Oreshkin, and Amer discloses the system of Claim 11. The combination of Shafir, Oreshkin, and Amer does not expressly disclose the limitations of Claim 14; however, Protter discloses wherein the user feedback information comprises at least one of
labels or text descriptions for the human motion data that describe types of the human motion data or artifacts in the human motion data (Protter, [0074]: teaches clustering groups of movements <read on human motion data>, which are candidate motion estimates, where "clustering may be performed explicitly by “labeling” <read on labels> each record with one or more clustering parameters (e.g., user-type, age, height, etc.)"; [0075]: teaches the groups of movements being created by supervised learning of a movement language).




Protter is analogous art with respect to the combination of Shafir, Oreshkin, and Amer because they are from the same field of endeavor, namely processing human motion data via neural networks. Before the effective filing date of the claimed invention, it would have been obvious to a person of ordinary skill in the art to generate updated motion estimates and estimate future candidate movements as taught by Protter into the combined teaching of Shafir, Oreshkin, and Amer. The suggestion for doing so would allow the system to discard impossible motion estimates, thereby improving quality of synthesized motion estimates. Therefore, it would have been obvious to combine Protter with the combination of Shafir, Oreshkin, and Amer.

















Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Francis (US 20240185498 A1) discloses generating computer graphics motion using a diffusion model; and
Peris et al. (US 20240241573 A1) discloses training a diffusion model based on full body motion tracking.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KARL TRUONG whose telephone number is (703)756-5915. The examiner can normally be reached 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached at (571) 272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/K.D.T./Examiner, Art Unit 2614                                                                                                                                                                                                        
/KENT W CHANG/Supervisory Patent Examiner, Art Unit 2614
Read full office action
Prosecution Timeline

Jan 16, 2024
Application Filed
Oct 01, 2025
Non-Final Rejection — §103
Nov 19, 2025
Interview Requested
Dec 02, 2025
Applicant Interview (Telephonic)
Dec 02, 2025
Examiner Interview Summary
Dec 03, 2025
Response Filed
Jan 26, 2026
Final Rejection — §103
Mar 25, 2026
Examiner Interview Summary
Mar 25, 2026
Applicant Interview (Telephonic)
Apr 06, 2026
Request for Continued Examination
Apr 07, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/324,617
Patent 12573149
DATA PROCESSING METHOD AND APPARATUS, DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 10, 2026
18/455,592
Patent 12561875
ANIMATION FRAME DISPLAY METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Feb 24, 2026
18/211,149
Patent 12494013
AUTODECODING LATENT 3D DIFFUSION MODELS
2y 5m to grant Granted Dec 09, 2025
18/125,596
Patent 12456258
SYSTEMS AND METHODS FOR GENERATING A SHADOW MESH
2y 5m to grant Granted Oct 28, 2025
18/028,063
Patent 12444020
FLEXIBLE IMAGE ASPECT RATIO USING MACHINE LEARNING
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
83%
With Interview (+31.0%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 29 resolved cases by this examiner. Grant probability derived from career allow rate.