Last updated: April 19, 2026
Application No. 18/411,623
FULL BODY MOTION TRACKING FOR USE IN VIRTUAL ENVIRONMENT

Non-Final OA §103
Filed
Jan 12, 2024
Examiner
REED, STEPHEN T
Art Unit
2627
Tech Center
2600 — Communications
Assignee
Meta Platforms Technologies, LLC
OA Round
1 (Non-Final)
Interview Optional

— +15.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 474 resolved cases, 2023–2026
Examiner Intelligence

REED, STEPHEN T View full profile →
Grants 72% — above average
Career Allow Rate
342 granted / 474 resolved
+10.2% vs TC avg
Strong +16% interview lift
Without
With
+15.9%
Interview Lift
resolved cases with interview
Fast prosecutor
1y 10m
Avg Prosecution
23 currently pending
Career history
497
Total Applications
across all art units
Statute-Specific Performance

§101
2.3%
-37.7% vs TC avg
§103
56.5%
+16.5% vs TC avg
§102
20.6%
-19.4% vs TC avg
§112
18.0%
-22.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 474 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are currently pending and prosecuted.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12 April 2024, 17 November 2025, and 17 November 2025 was considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Perlin et al., U SPG-Pub 2021/0081031, hereinafter Perlin, in view of Brehmer et al., US PG-Pub 2024/0273261, hereinafter Brehmer.
Regarding Claim 1, Perlin teaches a method for full body motion tracking (system 10), comprising:
	receiving tracking signals from a plurality of sensors (left-hand controller 16, right-hand controller 18) associated with an upper body of a person (Fig. 4, and corresponding descriptions, [0033], “The system 10 comprises a left-hand controller 16 to be held by a left hand of the participant that produces position data of the left hand of the participant. The system 10 comprises a right-hand controller 18 to be held by a right hand of the participant that produces position data of the right hand of the participant.”); 
	based on the tracking signals, determining motion features and joint features (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “There is the step of constructing a full body pose 12 of a participant using only the data from the HMD 14 and the two hand controllers and the trackers.”); 
	training a machine learning model ([0044], “the present invention employs machine learning (ML), trained on a ground truth reference system 10 that can track the full body pose 12 of exemplar users”); 
	generating a plurality of inputs to the machine learning model (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”), the plurality of inputs comprising the motion features and the joint features (Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system)”); and 
	providing the plurality of inputs to the machine learning model to generate a plurality of outputs (Figs. 4-7, and corresponding descriptions; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”, [0097]-[0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”; [0102], providing pressure sensing for lower body positions), 
	wherein the plurality of outputs comprise sequences of full body poses (Figs. 4-7, and corresponding descriptions; [0091]-[0098], specifically, [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”), and the sequences of full body poses comprise upper body poses and lower body poses (Figs. 4-7, and corresponding descriptions; [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”).
	However, Perlin does not explicitly teach a diffusion model, the diffusion model comprising a multi-layer perceptron (MLP) network. 
	Brehmer teaches a diffusion model (Brehmer: Figs. 4, 6 and 9B, and corresponding descriptions; [0050]-[0052], “Diffusion models from a training perspective will take an image and will slowly add noise to the image to destroy the information in the image”), the diffusion model comprising a multi-layer perceptron (MLP) network (Brehmer: Fig. 9B, and corresponding descriptions; [0115], “These are then used as inputs to two multilayer perceptrons (MLPs) ϕ and ψ”, [0142], “These are then used as inputs to two MLPs, ∅ and ψ.”).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to incorporate the diffusion model taught by Brehmer into the device taught by Perlin in order to predict an original trajectory (Brehmer: [0073]), thereby providing a more accurate trajectory optimization for the system.
Regarding Claim 2, Perlin, as modified by Brehmer, teaches the method of claim 1, further comprising generating intermediate features from the motion features and the joint features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system). In this way, a very large number of specific examples of a mapping from 42 input values to 48 output values can be accumulated.”), wherein the plurality of inputs to the diffusion model comprise the intermediate features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0163], noting the input may be motion data), and the sequences of full body poses are generated based on the intermediate features (Perlin: [0037], “The computer produces a body pose 12 output of the participant from the 48 scalar values.”, [0061]-[0062], “a runtime phase, in which the constructed representation of the mapping is used to efficiently convert new sensor data values to corresponding body pose 12 values.”).
Regarding Claim 3, Perlin, as modified by Brehmer, teaches the method of claim 2, wherein generating the plurality of outputs comprises generating the plurality of outputs from the MLP network based on the intermediate features (Brehmer: Fig. 9B, and corresponding descriptions; [0112]-[0125], specifically, [0115]-[0117], describing how the MLPs are used to map output representations).
Regarding Claim 4, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the plurality of outputs comprise positions of a lower body of the person (Perlin: Figs. 4-7, and corresponding descriptions, [0035]-[0037], noting how the system tracks the foot pressure of the user), the method further comprising estimating the positions of the lower body based on the sequences of full body poses (Perlin: Figs. 4-7, and corresponding descriptions, [0036], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”, [0102]).
Regarding Claim 5, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the MLP network comprises a plurality of blocks (Brehmer: Figs. 9A-9B, and corresponding descriptions; [0110], [0127], noting a plurality of blocks), the method further comprising providing a timestep embedding to each block in the plurality of blocks (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”).
Regarding Claim 6, Perlin, as modified by Brehmer, teaches the method of claim 5, wherein the timestep embedding is provided to each block in the plurality of blocks through a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block.”) and a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks.”).
Regarding Claim 7, Perlin, as modified by Brehmer, teaches the method of claim 5, wherein each block in the plurality of blocks comprises a convolutional layer (Brehmer: [0110], “Each block includes two temporal convolutions”) and a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”).
Regarding Claim 8, Perlin, as modified by Brehmer, teaches the method of claim 7, wherein each block in the plurality of blocks further comprises a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks”), and a layer normalization (Brehmer: [0110], “each followed by a group normalization (GN)”).
Regarding Claim 9, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the plurality of sensors are inertial measurement units (IMUs) (Perlin: IMUs 32).
Regarding Claim 10, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the plurality of sensors consist of a first sensor mounted in a first handheld device (Perlin: left-hand controller 16), a second sensor mounted in a second handheld device (Perlin: right-hand controller 18), and a third sensor mounted in a head mounted device (HMD) (Perlin: HMD 14), and the tracking signals comprise a first orientation and a first translation of the first handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), a second orientation and a second translation of the second handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), and a third orientation and a third translation of the head mounted device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”).
Regarding Claim 11, Perlin teaches a program for full body motion tracking ([0121]), which when executed by a computer (computer 30), configures the computer to:
	receive tracking signals from a plurality of sensors (left-hand controller 16, right-hand controller 18) associated with an upper body of a person (Fig. 4, and corresponding descriptions, [0033], “The system 10 comprises a left-hand controller 16 to be held by a left hand of the participant that produces position data of the left hand of the participant. The system 10 comprises a right-hand controller 18 to be held by a right hand of the participant that produces position data of the right hand of the participant.”); 
	based on the tracking signals, determine motion features and joint features (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “There is the step of constructing a full body pose 12 of a participant using only the data from the HMD 14 and the two hand controllers and the trackers.”); 
	train a machine learning model ([0044], “the present invention employs machine learning (ML), trained on a ground truth reference system 10 that can track the full body pose 12 of exemplar users”); 
	generate a plurality of inputs to the machine learning model (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”), the plurality of inputs comprising the motion features and the joint features (Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system)”); and 
	provide the plurality of inputs to the machine learning model to generate a plurality of outputs (Figs. 4-7, and corresponding descriptions; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”, [0097]-[0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”; [0102], providing pressure sensing for lower body positions), 
	wherein the plurality of outputs comprise sequences of full body poses (Figs. 4-7, and corresponding descriptions; [0091]-[0098], specifically, [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”), and the sequences of full body poses comprise upper body poses and lower body poses (Figs. 4-7, and corresponding descriptions; [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”).
	However, Perlin does not explicitly teach a non-transitory computer-readable medium storing a program for full body motion tracking; or a diffusion model, the diffusion model comprising a multi-layer perceptron (MLP) network. 
	Brehmer teaches a non-transitory computer-readable medium storing a program for full body motion tracking (Brehmer: [0166]-[0167]); and 
	a diffusion model (Brehmer: Figs. 4, 6 and 9B, and corresponding descriptions; [0050]-[0052], “Diffusion models from a training perspective will take an image and will slowly add noise to the image to destroy the information in the image”), the diffusion model comprising a multi-layer perceptron (MLP) network (Brehmer: Fig. 9B, and corresponding descriptions; [0115], “These are then used as inputs to two multilayer perceptrons (MLPs) ϕ and ψ”, [0142], “These are then used as inputs to two MLPs, ∅ and ψ.”).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to incorporate the diffusion model taught by Brehmer into the device taught by Perlin in order to predict an original trajectory (Brehmer: [0073]), thereby providing a more accurate trajectory optimization for the system.
Regarding Claim 12, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the program, when executed by the computer, further configures the computer to: 
	generate intermediate features from the motion features and the joint features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system). In this way, a very large number of specific examples of a mapping from 42 input values to 48 output values can be accumulated.”), 
	wherein the plurality of inputs to the diffusion model comprise the intermediate features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0163], noting the input may be motion data), and the sequences of full body poses are generated based on the intermediate features (Perlin: [0037], “The computer produces a body pose 12 output of the participant from the 48 scalar values.”, [0061]-[0062], “a runtime phase, in which the constructed representation of the mapping is used to efficiently convert new sensor data values to corresponding body pose 12 values.”), and 
	wherein generating the plurality of outputs comprises generating the plurality of outputs from the MLP network based on the intermediate features (Brehmer: Fig. 9B, and corresponding descriptions; [0112]-[0125], specifically, [0115]-[0117], describing how the MLPs are used to map output representations).
Regarding Claim 13, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the plurality of outputs comprise positions of a lower body of the person (Perlin: Figs. 4-7, and corresponding descriptions, [0035]-[0037], noting how the system tracks the foot pressure of the user), the MLP network comprises a plurality of blocks (Brehmer: Figs. 9A-9B, and corresponding descriptions; [0110], [0127], noting a plurality of blocks), and the program, when executed by the computer, further configures the computer to: 
	estimate the positions of the lower body based on the sequences of full body poses (Perlin: Figs. 4-7, and corresponding descriptions, [0036], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”, [0102]); and 
	provide a timestep embedding to each block in the plurality of blocks (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”), 
	wherein the timestep embedding is provided to each block in the plurality of blocks through a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block.”) and a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks.”).
Regarding Claim 14, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 13, wherein each block in the plurality of blocks comprises a convolutional layer (Brehmer: [0110], “Each block includes two temporal convolutions”), a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”), a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks”), and a layer normalization (Brehmer: [0110], “each followed by a group normalization (GN)”).
Regarding Claim 15, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the plurality of sensors are inertial measurement units (IMUs) (Perlin: IMUs 32).
Regarding Claim 16, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the plurality of sensors consist of a first sensor mounted in a first handheld device (Perlin: left-hand controller 16), a second sensor mounted in a second handheld device (Perlin: right-hand controller 18), and a third sensor mounted in a head mounted device (HMD) (Perlin: HMD 14), and the tracking signals comprise a first orientation and a first translation of the first handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), a second orientation and a second translation of the second handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), and a third orientation and a third translation of the head mounted device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”).
Regarding Claim 17, Perlin teaches a system for full body motion tracking (system 10), comprising: 
	a processor (microprocessor 36); and 
	the processor to:
		receive tracking signals from a plurality of sensors (left-hand controller 16, right-hand controller 18) associated with an upper body of a person (Fig. 4, and corresponding descriptions, [0033], “The system 10 comprises a left-hand controller 16 to be held by a left hand of the participant that produces position data of the left hand of the participant. The system 10 comprises a right-hand controller 18 to be held by a right hand of the participant that produces position data of the right hand of the participant.”); 
		based on the tracking signals, determine motion features and joint features (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “There is the step of constructing a full body pose 12 of a participant using only the data from the HMD 14 and the two hand controllers and the trackers.”); 
		train a machine learning model ([0044], “the present invention employs machine learning (ML), trained on a ground truth reference system 10 that can track the full body pose 12 of exemplar users”); 
		generate a plurality of inputs to the machine learning model (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”), the plurality of inputs comprising the motion features and the joint features (Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system)”); and 
		provide the plurality of inputs to the machine learning diffusion model to generate a plurality of outputs (Figs. 4-7, and corresponding descriptions; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”, [0097]-[0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”; [0102], providing pressure sensing for lower body positions), 
		wherein the plurality of outputs comprise sequences of full body poses (Figs. 4-7, and corresponding descriptions; [0091]-[0098], specifically, [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”), and the sequences of full body poses comprise upper body poses and lower body poses (Figs. 4-7, and corresponding descriptions; [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”).
	However, Perlin does not explicitly teach a non-transitory computer readable medium storing a set of instructions; or a diffusion model, the diffusion model comprising a multi-layer perceptron (MLP) network. 
	Brehmer teaches a non-transitory computer readable medium storing a set of instructions (Brehmer: [0166]-[0167]); and
	a diffusion model (Brehmer: Figs. 4, 6 and 9B, and corresponding descriptions; [0050]-[0052], “Diffusion models from a training perspective will take an image and will slowly add noise to the image to destroy the information in the image”), the diffusion model comprising a multi-layer perceptron (MLP) network (Brehmer: Fig. 9B, and corresponding descriptions; [0115], “These are then used as inputs to two multilayer perceptrons (MLPs) ϕ and ψ”, [0142], “These are then used as inputs to two MLPs, ∅ and ψ.”).
	It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to incorporate the diffusion model taught by Brehmer into the device taught by Perlin in order to predict an original trajectory (Brehmer: [0073]), thereby providing a more accurate trajectory optimization for the system.
Regarding Claim 18, Perlin, as modified by Brehmer, teaches the system of claim 17, wherein the instructions, when executed by the processor, further configure the processor to: 
	generate intermediate features from the motion features and the joint features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system). In this way, a very large number of specific examples of a mapping from 42 input values to 48 output values can be accumulated.”), 
	wherein the plurality of inputs to the diffusion model comprise the intermediate features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0163], noting the input may be motion data), and the sequences of full body poses are generated based on the intermediate features (Perlin: [0037], “The computer produces a body pose 12 output of the participant from the 48 scalar values.”, [0061]-[0062], “a runtime phase, in which the constructed representation of the mapping is used to efficiently convert new sensor data values to corresponding body pose 12 values.”), and 
	wherein generating the plurality of outputs comprises generating the plurality of outputs from the MLP network based on the intermediate features (Brehmer: Fig. 9B, and corresponding descriptions; [0112]-[0125], specifically, [0115]-[0117], describing how the MLPs are used to map output representations).
Regarding Claim 19, Perlin, as modified by Brehmer, teaches the system of claim 17, wherein the plurality of outputs comprise positions of a lower body of the person (Perlin: Figs. 4-7, and corresponding descriptions, [0035]-[0037], noting how the system tracks the foot pressure of the user), the MLP network comprises a plurality of blocks (Brehmer: Figs. 9A-9B, and corresponding descriptions; [0110], [0127], noting a plurality of blocks), and the instructions, when executed by the processor, further configure the processor to: 
	estimate the positions of the lower body based on the sequences of full body poses (Perlin: Figs. 4-7, and corresponding descriptions, [0036], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”, [0102]); and 
	provide a timestep embedding to each block in the plurality of blocks (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”), 
	wherein the timestep embedding is provided to each block in the plurality of blocks through a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block.”) and a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks.”), 
	wherein each block in the plurality of blocks comprises a convolutional layer (Brehmer: [0110], “Each block includes two temporal convolutions”), a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”), a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks”), and a layer normalization (Brehmer: [0110], “each followed by a group normalization (GN)”).
Regarding Claim 20, Perlin, as modified by Brehmer, teaches the system of claim 17, wherein the plurality of sensors are inertial measurement units (IMUs) (Perlin: IMUs 32), the plurality of sensors consist of a first sensor mounted in a first handheld device (Perlin: left-hand controller 16), a second sensor mounted in a second handheld device (Perlin: right-hand controller 18), and a third sensor mounted in a head mounted device (HMD) (Perlin: HMD 14), and the tracking signals comprise a first orientation and a first translation of the first handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), a second orientation and a second translation of the second handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), and a third orientation and a third translation of the head mounted device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEPHEN T REED whose telephone number is (571)272-7234. The examiner can normally be reached M-F: 0800-1800.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at 571-272-7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Stephen T. Reed/Primary Examiner, Art Unit 2627
Read full office action
Prosecution Timeline

Jan 12, 2024
Application Filed
Feb 19, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/669,583
Patent 12596455
CONTROL METHOD FOR A TOUCHPAD
2y 5m to grant Granted Apr 07, 2026
17/829,789
Patent 12573253
TOUCHSCREEN FOR ELECTRONIC LOCKS
2y 5m to grant Granted Mar 10, 2026
18/262,592
Patent 12572443
DIAGNOSIS DEVICE FOR DETERMINING NOISE LEVEL
2y 5m to grant Granted Mar 10, 2026
19/072,245
Patent 12572248
DETECTING DEVICE
2y 5m to grant Granted Mar 10, 2026
18/664,735
Patent 12566488
INTERFACE APPARATUS AND BOARD SPORT EXPERIENCE SYSTEM
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
72%
Grant Probability
88%
With Interview (+15.9%)
1y 10m
Median Time to Grant
Low
PTA Risk
Based on 474 resolved cases by this examiner. Grant probability derived from career allow rate.
FULL BODY MOTION TRACKING FOR USE IN VIRTUAL ENVIRONMENT

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email