DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are currently pending and prosecuted.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12 April 2024, 17 November 2025, and 17 November 2025 was considered by the examiner.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Perlin et al., U SPG-Pub 2021/0081031, hereinafter Perlin, in view of Brehmer et al., US PG-Pub 2024/0273261, hereinafter Brehmer.
Regarding Claim 1, Perlin teaches a method for full body motion tracking (system 10), comprising:
receiving tracking signals from a plurality of sensors (left-hand controller 16, right-hand controller 18) associated with an upper body of a person (Fig. 4, and corresponding descriptions, [0033], “The system 10 comprises a left-hand controller 16 to be held by a left hand of the participant that produces position data of the left hand of the participant. The system 10 comprises a right-hand controller 18 to be held by a right hand of the participant that produces position data of the right hand of the participant.”);
based on the tracking signals, determining motion features and joint features (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “There is the step of constructing a full body pose 12 of a participant using only the data from the HMD 14 and the two hand controllers and the trackers.”);
training a machine learning model ([0044], “the present invention employs machine learning (ML), trained on a ground truth reference system 10 that can track the full body pose 12 of exemplar users”);
generating a plurality of inputs to the machine learning model (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”), the plurality of inputs comprising the motion features and the joint features (Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system)”); and
providing the plurality of inputs to the machine learning model to generate a plurality of outputs (Figs. 4-7, and corresponding descriptions; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”, [0097]-[0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”; [0102], providing pressure sensing for lower body positions),
wherein the plurality of outputs comprise sequences of full body poses (Figs. 4-7, and corresponding descriptions; [0091]-[0098], specifically, [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”), and the sequences of full body poses comprise upper body poses and lower body poses (Figs. 4-7, and corresponding descriptions; [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”).
However, Perlin does not explicitly teach a diffusion model, the diffusion model comprising a multi-layer perceptron (MLP) network.
Brehmer teaches a diffusion model (Brehmer: Figs. 4, 6 and 9B, and corresponding descriptions; [0050]-[0052], “Diffusion models from a training perspective will take an image and will slowly add noise to the image to destroy the information in the image”), the diffusion model comprising a multi-layer perceptron (MLP) network (Brehmer: Fig. 9B, and corresponding descriptions; [0115], “These are then used as inputs to two multilayer perceptrons (MLPs) ϕ and ψ”, [0142], “These are then used as inputs to two MLPs, ∅ and ψ.”).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to incorporate the diffusion model taught by Brehmer into the device taught by Perlin in order to predict an original trajectory (Brehmer: [0073]), thereby providing a more accurate trajectory optimization for the system.
Regarding Claim 2, Perlin, as modified by Brehmer, teaches the method of claim 1, further comprising generating intermediate features from the motion features and the joint features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system). In this way, a very large number of specific examples of a mapping from 42 input values to 48 output values can be accumulated.”), wherein the plurality of inputs to the diffusion model comprise the intermediate features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0163], noting the input may be motion data), and the sequences of full body poses are generated based on the intermediate features (Perlin: [0037], “The computer produces a body pose 12 output of the participant from the 48 scalar values.”, [0061]-[0062], “a runtime phase, in which the constructed representation of the mapping is used to efficiently convert new sensor data values to corresponding body pose 12 values.”).
Regarding Claim 3, Perlin, as modified by Brehmer, teaches the method of claim 2, wherein generating the plurality of outputs comprises generating the plurality of outputs from the MLP network based on the intermediate features (Brehmer: Fig. 9B, and corresponding descriptions; [0112]-[0125], specifically, [0115]-[0117], describing how the MLPs are used to map output representations).
Regarding Claim 4, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the plurality of outputs comprise positions of a lower body of the person (Perlin: Figs. 4-7, and corresponding descriptions, [0035]-[0037], noting how the system tracks the foot pressure of the user), the method further comprising estimating the positions of the lower body based on the sequences of full body poses (Perlin: Figs. 4-7, and corresponding descriptions, [0036], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”, [0102]).
Regarding Claim 5, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the MLP network comprises a plurality of blocks (Brehmer: Figs. 9A-9B, and corresponding descriptions; [0110], [0127], noting a plurality of blocks), the method further comprising providing a timestep embedding to each block in the plurality of blocks (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”).
Regarding Claim 6, Perlin, as modified by Brehmer, teaches the method of claim 5, wherein the timestep embedding is provided to each block in the plurality of blocks through a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block.”) and a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks.”).
Regarding Claim 7, Perlin, as modified by Brehmer, teaches the method of claim 5, wherein each block in the plurality of blocks comprises a convolutional layer (Brehmer: [0110], “Each block includes two temporal convolutions”) and a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”).
Regarding Claim 8, Perlin, as modified by Brehmer, teaches the method of claim 7, wherein each block in the plurality of blocks further comprises a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks”), and a layer normalization (Brehmer: [0110], “each followed by a group normalization (GN)”).
Regarding Claim 9, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the plurality of sensors are inertial measurement units (IMUs) (Perlin: IMUs 32).
Regarding Claim 10, Perlin, as modified by Brehmer, teaches the method of claim 1, wherein the plurality of sensors consist of a first sensor mounted in a first handheld device (Perlin: left-hand controller 16), a second sensor mounted in a second handheld device (Perlin: right-hand controller 18), and a third sensor mounted in a head mounted device (HMD) (Perlin: HMD 14), and the tracking signals comprise a first orientation and a first translation of the first handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), a second orientation and a second translation of the second handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), and a third orientation and a third translation of the head mounted device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”).
Regarding Claim 11, Perlin teaches a program for full body motion tracking ([0121]), which when executed by a computer (computer 30), configures the computer to:
receive tracking signals from a plurality of sensors (left-hand controller 16, right-hand controller 18) associated with an upper body of a person (Fig. 4, and corresponding descriptions, [0033], “The system 10 comprises a left-hand controller 16 to be held by a left hand of the participant that produces position data of the left hand of the participant. The system 10 comprises a right-hand controller 18 to be held by a right hand of the participant that produces position data of the right hand of the participant.”);
based on the tracking signals, determine motion features and joint features (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “There is the step of constructing a full body pose 12 of a participant using only the data from the HMD 14 and the two hand controllers and the trackers.”);
train a machine learning model ([0044], “the present invention employs machine learning (ML), trained on a ground truth reference system 10 that can track the full body pose 12 of exemplar users”);
generate a plurality of inputs to the machine learning model (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”), the plurality of inputs comprising the motion features and the joint features (Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system)”); and
provide the plurality of inputs to the machine learning model to generate a plurality of outputs (Figs. 4-7, and corresponding descriptions; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”, [0097]-[0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”; [0102], providing pressure sensing for lower body positions),
wherein the plurality of outputs comprise sequences of full body poses (Figs. 4-7, and corresponding descriptions; [0091]-[0098], specifically, [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”), and the sequences of full body poses comprise upper body poses and lower body poses (Figs. 4-7, and corresponding descriptions; [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”).
However, Perlin does not explicitly teach a non-transitory computer-readable medium storing a program for full body motion tracking; or a diffusion model, the diffusion model comprising a multi-layer perceptron (MLP) network.
Brehmer teaches a non-transitory computer-readable medium storing a program for full body motion tracking (Brehmer: [0166]-[0167]); and
a diffusion model (Brehmer: Figs. 4, 6 and 9B, and corresponding descriptions; [0050]-[0052], “Diffusion models from a training perspective will take an image and will slowly add noise to the image to destroy the information in the image”), the diffusion model comprising a multi-layer perceptron (MLP) network (Brehmer: Fig. 9B, and corresponding descriptions; [0115], “These are then used as inputs to two multilayer perceptrons (MLPs) ϕ and ψ”, [0142], “These are then used as inputs to two MLPs, ∅ and ψ.”).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to incorporate the diffusion model taught by Brehmer into the device taught by Perlin in order to predict an original trajectory (Brehmer: [0073]), thereby providing a more accurate trajectory optimization for the system.
Regarding Claim 12, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the program, when executed by the computer, further configures the computer to:
generate intermediate features from the motion features and the joint features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system). In this way, a very large number of specific examples of a mapping from 42 input values to 48 output values can be accumulated.”),
wherein the plurality of inputs to the diffusion model comprise the intermediate features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0163], noting the input may be motion data), and the sequences of full body poses are generated based on the intermediate features (Perlin: [0037], “The computer produces a body pose 12 output of the participant from the 48 scalar values.”, [0061]-[0062], “a runtime phase, in which the constructed representation of the mapping is used to efficiently convert new sensor data values to corresponding body pose 12 values.”), and
wherein generating the plurality of outputs comprises generating the plurality of outputs from the MLP network based on the intermediate features (Brehmer: Fig. 9B, and corresponding descriptions; [0112]-[0125], specifically, [0115]-[0117], describing how the MLPs are used to map output representations).
Regarding Claim 13, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the plurality of outputs comprise positions of a lower body of the person (Perlin: Figs. 4-7, and corresponding descriptions, [0035]-[0037], noting how the system tracks the foot pressure of the user), the MLP network comprises a plurality of blocks (Brehmer: Figs. 9A-9B, and corresponding descriptions; [0110], [0127], noting a plurality of blocks), and the program, when executed by the computer, further configures the computer to:
estimate the positions of the lower body based on the sequences of full body poses (Perlin: Figs. 4-7, and corresponding descriptions, [0036], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”, [0102]); and
provide a timestep embedding to each block in the plurality of blocks (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”),
wherein the timestep embedding is provided to each block in the plurality of blocks through a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block.”) and a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks.”).
Regarding Claim 14, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 13, wherein each block in the plurality of blocks comprises a convolutional layer (Brehmer: [0110], “Each block includes two temporal convolutions”), a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”), a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks”), and a layer normalization (Brehmer: [0110], “each followed by a group normalization (GN)”).
Regarding Claim 15, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the plurality of sensors are inertial measurement units (IMUs) (Perlin: IMUs 32).
Regarding Claim 16, Perlin, as modified by Brehmer, teaches the non-transitory computer-readable medium of claim 11, wherein the plurality of sensors consist of a first sensor mounted in a first handheld device (Perlin: left-hand controller 16), a second sensor mounted in a second handheld device (Perlin: right-hand controller 18), and a third sensor mounted in a head mounted device (HMD) (Perlin: HMD 14), and the tracking signals comprise a first orientation and a first translation of the first handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), a second orientation and a second translation of the second handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), and a third orientation and a third translation of the head mounted device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”).
Regarding Claim 17, Perlin teaches a system for full body motion tracking (system 10), comprising:
a processor (microprocessor 36); and
the processor to:
receive tracking signals from a plurality of sensors (left-hand controller 16, right-hand controller 18) associated with an upper body of a person (Fig. 4, and corresponding descriptions, [0033], “The system 10 comprises a left-hand controller 16 to be held by a left hand of the participant that produces position data of the left hand of the participant. The system 10 comprises a right-hand controller 18 to be held by a right hand of the participant that produces position data of the right hand of the participant.”);
based on the tracking signals, determine motion features and joint features (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “There is the step of constructing a full body pose 12 of a participant using only the data from the HMD 14 and the two hand controllers and the trackers.”);
train a machine learning model ([0044], “the present invention employs machine learning (ML), trained on a ground truth reference system 10 that can track the full body pose 12 of exemplar users”);
generate a plurality of inputs to the machine learning model (Figs. 4-7, and corresponding descriptions; [0036]-[0044], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”), the plurality of inputs comprising the motion features and the joint features (Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system)”); and
provide the plurality of inputs to the machine learning diffusion model to generate a plurality of outputs (Figs. 4-7, and corresponding descriptions; [0062], “ML is employed in order to learn a mapping from an HMI) 14 and two controllers, as well as an IMU 32”, [0097]-[0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”; [0102], providing pressure sensing for lower body positions),
wherein the plurality of outputs comprise sequences of full body poses (Figs. 4-7, and corresponding descriptions; [0091]-[0098], specifically, [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”), and the sequences of full body poses comprise upper body poses and lower body poses (Figs. 4-7, and corresponding descriptions; [0098], “During the run-time, after the 3D pose has been recovered for all users, then the 3D pose for all users is transmitted wirelessly from the server computer 30 to the HMDs of all users, where the pose data is then used to construct, for each user, 3D computer animated representations of the avatars of all users”).
However, Perlin does not explicitly teach a non-transitory computer readable medium storing a set of instructions; or a diffusion model, the diffusion model comprising a multi-layer perceptron (MLP) network.
Brehmer teaches a non-transitory computer readable medium storing a set of instructions (Brehmer: [0166]-[0167]); and
a diffusion model (Brehmer: Figs. 4, 6 and 9B, and corresponding descriptions; [0050]-[0052], “Diffusion models from a training perspective will take an image and will slowly add noise to the image to destroy the information in the image”), the diffusion model comprising a multi-layer perceptron (MLP) network (Brehmer: Fig. 9B, and corresponding descriptions; [0115], “These are then used as inputs to two multilayer perceptrons (MLPs) ϕ and ψ”, [0142], “These are then used as inputs to two MLPs, ∅ and ψ.”).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to incorporate the diffusion model taught by Brehmer into the device taught by Perlin in order to predict an original trajectory (Brehmer: [0073]), thereby providing a more accurate trajectory optimization for the system.
Regarding Claim 18, Perlin, as modified by Brehmer, teaches the system of claim 17, wherein the instructions, when executed by the processor, further configure the processor to:
generate intermediate features from the motion features and the joint features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], “as the operator moves around while wearing both the input sensors and the full motion capture suit 24, at each time-step the operator generates both 42 scalar input values (from the input sensors) and 48 body pose 12 values (from the motion capture system). In this way, a very large number of specific examples of a mapping from 42 input values to 48 output values can be accumulated.”),
wherein the plurality of inputs to the diffusion model comprise the intermediate features (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0163], noting the input may be motion data), and the sequences of full body poses are generated based on the intermediate features (Perlin: [0037], “The computer produces a body pose 12 output of the participant from the 48 scalar values.”, [0061]-[0062], “a runtime phase, in which the constructed representation of the mapping is used to efficiently convert new sensor data values to corresponding body pose 12 values.”), and
wherein generating the plurality of outputs comprises generating the plurality of outputs from the MLP network based on the intermediate features (Brehmer: Fig. 9B, and corresponding descriptions; [0112]-[0125], specifically, [0115]-[0117], describing how the MLPs are used to map output representations).
Regarding Claim 19, Perlin, as modified by Brehmer, teaches the system of claim 17, wherein the plurality of outputs comprise positions of a lower body of the person (Perlin: Figs. 4-7, and corresponding descriptions, [0035]-[0037], noting how the system tracks the foot pressure of the user), the MLP network comprises a plurality of blocks (Brehmer: Figs. 9A-9B, and corresponding descriptions; [0110], [0127], noting a plurality of blocks), and the instructions, when executed by the processor, further configure the processor to:
estimate the positions of the lower body based on the sequences of full body poses (Perlin: Figs. 4-7, and corresponding descriptions, [0036], “The server computer 30 may receive a total number of input scalar values of 42, where six scalar values are for the position and orientation of the HMD 14 and for each of the right hand controller 18 and the left hand controller 16, six from each of the IMU 32 of the right tracker 20 and left tracker 22, and six for each of the foot pressure tracking insole 38 of the right tracker 20 and the left tracker 22, the server computer 30 produces a body pose 12 output of the participant from the 42 input scalar values”, [0102]); and
provide a timestep embedding to each block in the plurality of blocks (Perlin: Figs. 4-7, and corresponding descriptions; [0053], noting how time-step data is detected and used to determine the pose data; Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”),
wherein the timestep embedding is provided to each block in the plurality of blocks through a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block.”) and a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks.”),
wherein each block in the plurality of blocks comprises a convolutional layer (Brehmer: [0110], “Each block includes two temporal convolutions”), a fully connected layer (Brehmer: [0110], “Timestep embeddings are produced by a single fully-connected layer 906 and added to the activations of the first temporal confusion within each block”), a sigmoid linear unit activation layer (Brehmer: [0110], “Mish is a self-regularized non-monotonic activation function which can play a role in performance and training dynamics and neural networks”), and a layer normalization (Brehmer: [0110], “each followed by a group normalization (GN)”).
Regarding Claim 20, Perlin, as modified by Brehmer, teaches the system of claim 17, wherein the plurality of sensors are inertial measurement units (IMUs) (Perlin: IMUs 32), the plurality of sensors consist of a first sensor mounted in a first handheld device (Perlin: left-hand controller 16), a second sensor mounted in a second handheld device (Perlin: right-hand controller 18), and a third sensor mounted in a head mounted device (HMD) (Perlin: HMD 14), and the tracking signals comprise a first orientation and a first translation of the first handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), a second orientation and a second translation of the second handheld device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”), and a third orientation and a third translation of the head mounted device (Perlin: [0048], “ the sensor input may consist of the position and orientation of each of the HMD 14, the left hand controller 16 and the right hand controller 18”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEPHEN T REED whose telephone number is (571)272-7234. The examiner can normally be reached M-F: 0800-1800.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at 571-272-7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Stephen T. Reed/Primary Examiner, Art Unit 2627