Last updated: April 19, 2026
Application No. 18/569,996
ENHANCED TECHNIQUES FOR REAL-TIME MULTI-PERSON THREE-DIMENSIONAL POSE TRACKING USING A SINGLE CAMERA

Non-Final OA §103
Filed
Dec 13, 2023
Examiner
ALLEN, KYLA GUAN-PING TI
Art Unit
2661
Tech Center
2600 — Communications
Assignee
Intel Corporation
OA Round
1 (Non-Final)
Interview Optional

— +17.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 53 resolved cases, 2023–2026
Examiner Intelligence

ALLEN, KYLA GUAN-PING TI View full profile →
Grants 89% — above average
Career Allow Rate
47 granted / 53 resolved
+26.7% vs TC avg
Strong +17% interview lift
Without
With
+17.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
30 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
9.9%
-30.1% vs TC avg
§103
52.5%
+12.5% vs TC avg
§102
19.3%
-20.7% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 53 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending regarding this application.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/13/2023 and 08/07/2024 are
considered and attached.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
In claim 19:  “means for receiving two-dimensional image data from a camera, the two-dimensional image data representing a first person and a second person”, “means for generating, based on the two-dimensional image data, first two-dimensional positions of body parts represented by the first person”, “means for generating, based on the two-dimensional image data, second two-dimensional positions of body parts represented by the second person”, “means for generating, using a deep neural network, based on the first two-dimensional positions, a first root-relative three-dimensional pose regression of the body parts represented by the first person”, “means for generating, using the deep neural network, based on the second two- dimensional positions, a second root-relative three-dimensional pose regression of the body parts represented by the second person”, “means for identifying, based on the first two-dimensional positions and the first root- relative three-dimensional pose regression, contact between a ground plane and a foot of the first person”, “means for identifying, based on the second two-dimensional positions and the second root-relative three-dimensional pose regression, contact between the ground plane and a foot of the second person”, “means for generating a first absolute three-dimensional position of the contact between the ground plane and the foot of the first person”, “means for generating a second absolute three-dimensional position of the contact between the ground plane and the foot of the second person”, “means for generating, based on the first absolute three-dimensional position, a first three- dimensional pose of the body parts represented by the first person”, and “means for generating, based on the second absolute three-dimensional position, a second three-dimensional pose of the body parts represented by the second person”.
In claim 20: “means for identifying two-dimensional positions of the ground plane based on the two- dimensional image data” and “means for generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane”
After a careful analysis, as disclosed above, and a careful review of the specification, the above limitations in claims 19 and 20 are interpreted as computer-implemented 112(f). Below is the corresponding structure and algorithm which are being read into the above limitations:
“means for receiving two-dimensional image data from a camera” (In the specification, page 4, lines 9-11 state that a system may either be on the camera or remote from the camera in order to carry out the operations, which include receiving the 2D image data. The specification further states that the system may be a computer system with multiple processor cores or a single processor core on page 13, line 24 and page 13, line 34- page 14, line 1, respectively. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for receiving two-dimensional image data from a camera is a processor, in conjunction with the algorithm which can be found on page 11, lines 21-27. See also page 19, lines 7-12)
“means for generating, based on the two-dimensional image data, first two-dimensional positions of body parts represented by the first person” (In the specification, page 11, lines 28-page 12, line 2 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating, based on the two-dimensional image data, first two-dimensional positions of body parts represented by the first person is a processor, in conjunction with the algorithm which can be found on page 6, lines 9-23)
“means for generating, based on the two-dimensional image data, second two-dimensional positions of body parts represented by the second person” (In the specification, page 11, lines 28-page 12, line 2 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating, based on the two-dimensional image data, second two-dimensional positions of body parts represented by the second person is a processor, in conjunction with the algorithm which can be found on page 6, lines 9-23)
“means for generating, using a deep neural network, based on the first two-dimensional positions, a first root-relative three-dimensional pose regression of the body parts represented by the first person” (In the specification, page 12, lines 3-9 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating, using a deep neural network, based on the first two-dimensional positions, a first root-relative three-dimensional pose regression of the body parts represented by the first person is a processor, in conjunction with the algorithm which can be found on page 2, lines 21-20. See also pages 6, line 24 – page 7, line 6)
“means for generating, using the deep neural network, based on the second two- dimensional positions, a second root-relative three-dimensional pose regression of the body parts represented by the second person” (In the specification, page 12, lines 3-9 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating, using a deep neural network, based on the second two-dimensional positions, a second root-relative three-dimensional pose regression of the body parts represented by the second person is a processor, in conjunction with the algorithm which can be found on page 2, lines 21-20. See also pages 6, line 24 – page 7, line 6)
“means for identifying, based on the first two-dimensional positions and the first root- relative three-dimensional pose regression, contact between a ground plane and a foot of the first person” (In the specification, page 12, lines 10-23 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for identifying, based on the first two-dimensional positions and the first root- relative three-dimensional pose regression, contact between a ground plane and a foot of the first person is a system which inherently utilizes a processor, in conjunction with the algorithm which can be found on page 7, lines 7-26)
“means for identifying, based on the second two-dimensional positions and the second root-relative three-dimensional pose regression, contact between the ground plane and a foot of the second person” (In the specification, page 12, lines 10-23 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for identifying, based on the second two-dimensional positions and the second root- relative three-dimensional pose regression, contact between a ground plane and a foot of the second person is a processor, in conjunction with the algorithm which can be found on page 7, lines 7-26)
“means for generating a first absolute three-dimensional position of the contact between the ground plane and the foot of the first person” (In the specification, page 12, lines 24-32 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating a first absolute three-dimensional position of the contact between the ground plane and the foot of the first person is a processor, in conjunction with the algorithm which can be found on page 7, line 27 – page 8, line 21)
“means for generating a second absolute three-dimensional position of the contact between the ground plane and the foot of the second person” (In the specification, page 12, lines 24-32 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating a second absolute three-dimensional position of the contact between the ground plane and the foot of the second person is a processor, in conjunction with the algorithm which can be found on page 7, line 27 – page 8, line 21)
“means for generating, based on the first absolute three-dimensional position, a first three- dimensional pose of the body parts represented by the first person” (In the specification, page 12, line 3 – page 13, line 7 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating, based on the first absolute three-dimensional position, a first three- dimensional pose of the body parts represented by the first person is a processor, in conjunction with the algorithm which can be found on page 7, line 27 – page 8, line 31)
“means for generating, based on the second absolute three-dimensional position, a second three-dimensional pose of the body parts represented by the second person” (In the specification, page 12, line 3 – page 13, line 7 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating, based on the second absolute three-dimensional position, a second three- dimensional pose of the body parts represented by the second person is a processor, in conjunction with the algorithm which can be found on page 7, line 27 – page 8, line 31)
“means for identifying two-dimensional positions of the ground plane based on the two- dimensional image data” (In the specification, page 11, lines 28-page 12, line 2 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for identifying two-dimensional positions of the ground plane based on the two- dimensional image data is a processor, in conjunction with the algorithm which can be found on page 6, lines 9-23)
“means for generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane” (In the specification, page 12, lines 26-32 state that a device may be used to generate the above limitation. The specification further states that the system may be implemented as part of the device and can carry out the processes outlined in the figures on page 13, lines 18-32. As a result, this limitation is interpreted as a computer-implemented means plus function, wherein the structure being read into the claim is the corresponding structure and algorithm for the means plus function. See MPEP 2181. The corresponding structure for the means for generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane is a processor, in conjunction with the algorithm which can be found on page 5, line 23 – page 6, line 5).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 10, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Shimada et al. (“PhysCap: physically plausible monocular 3D motion capture in real time”), hereinafter Shimada, in view of Huelsdunk et al. (U.S. Publication No. 2021/0192783 A1), hereinafter Huelsdunk.
Regarding claim 1, Shimada teaches a method for real-time three-dimensional human pose tracking using two- dimensional image data (Shimada, see FIG. 3), the method comprising:
receiving, by at least one processor of a device, two-dimensional image data from a camera, the two-dimensional image data (Shimada teaches that “the input to PhysCap is a 2D image sequence” wherein the method assumes a perspective camera model in Section 3 Body Modeling and Preliminaries) representing a first person and a second person; 
generating, by the at least one processor, based on the two-dimensional image data, first two-dimensional positions of body parts represented by the first person (Shimada teaches “first predict[ing] heatmaps of 2D joints” in 4.1 Stage I: Kinematic Pose Estimation; the heatmaps are interpreted as 2D positions of the body parts); 
generating, by the at least one processor, based on the two-dimensional image data, (See above citation); 
generating, by the at least one processor, using a deep neural network, based on the first two-dimensional positions, a first root-relative three-dimensional pose regression of the body parts represented by the first person (Shimada teaches “first predict[ing] … root-relative location maps of joint positions in 3D with a specially tailored fully convolutional neural network using a ResNet” in 4.1 Stage I: Kinematic Pose Estimation; See also Section 4 Method wherein the 3D location map is described as a 3D location map regression for each body joint); 
generating, by the at least one processor, using the deep neural network, based on the (See above citation); 
identifying, by the at least one processor, based on the first two-dimensional positions and the first root-relative three-dimensional pose regression, contact between a ground plane and a foot of the first person (See FIG. 3 which shows that the foot contact is based on the 2D heat maps and 3D location maps. Additionally, Shimada teaches “The second stage performs foot contact and motion state detection, which uses 2D joint detections K𝑡 to classify the poses reconstructed so far into stationary and non-stationary” in Section 4 Method); 
identifying, by the at least one processor, based on the (See above citation); 
generating, by the at least one processor, a first absolute three-dimensional position of the contact between the ground plane and the foot of the first person (Shimada teaches “J ∈ R 6𝑁𝑐×𝑚 is a contact Jacobi matrix which relates the external forces to joint coordinates” in Section 3 Body Model and Preliminaries, and calculating the contact position in the calculation of the Ground Reaction Force Estimation in Section 4.3.4; See also FIG. 4 which depicts a Base of Support which is “an area on the ground bounded by the foot contact points” as shown in Section 4.2 Stage II: Foot Contact and Motion State Detection; This foot contact position which is based on a relative relationship between the foot and the ground is interpreted as equivalent to the absolute position of the contact); 
generating, by the at least one processor, a (See above citation); 
generating, by the at least one processor, based on the first absolute three-dimensional position, a first three-dimensional pose of the body parts represented by the first person (Shimada teaches “when contact is detected (Sec. 4.3.3), we integrate the estimated ground reaction force (Sec. 4.3.4) in the equation of motion. In addition, we introduce contact constraints to prevent foot-floor penetration and foot sliding when contacts are detected” … “and stage III returns the 𝑛-th output from v) as the final character pose q” in Section 4.3.5 Physics-Based Pose Optimization; here, the final character pose is based on the GRF (which involves the foot (absolute) position) and contact between the foot and the ground-plane) (See also Huelsdunk’s teaching of the foot contact position); and 
generating, by the at least one processor, based on the (See above citation).
Shimada fails to teach the above limitations in the context of a second person. 
However, the process taught above by Shimada regarding determining the generation of the three-dimensional pose can be combined with Huelsdunk to teach the above process in the context of determining poses of multiple subjects (Huelsdunk teaches that if the image represents a scene including plural persons, relative 3D joint locations may be estimated for each of the plural persons in the scene” in para. [0110] (See also para. [0142]), wherein the “intersect[ion of] the ground plane at an absolute 3D location L, which may serve as an estimate of the absolute foot location” as shown in para [0142]).
Shimada and Huelsdunk are both considered to be analogous to the claimed invention because they are in the same field of determining human poses through foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada to incorporate the teachings of Huelsdunk  and include the above process in the context of determining poses of multiple subjects. The motivation for doing so would have been to “improve[] accuracy by gathering multiple estimates over time”, as suggested by Huelsdunk in para. [0307]. See also para. [0355] and [0149]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada with Huelsdunk to obtain the invention specified in claim 1. 

Regarding claim 10, Shimada teaches a system for real-time three-dimensional human pose tracking using two- dimensional image data, the system comprising at least one processor coupled to memory, the at least one processor (Shimada teaches using a processor to carry out the method in Section 5.1 Implementation, wherein a PC is used, which inherently has memory) configured to:
receive two-dimensional image data from a camera, the two-dimensional image data (Shimada teaches that “the input to PhysCap is a 2D image sequence” wherein the method assumes a perspective camera model in Section 3 Body Modeling and Preliminaries) representing a first person and a second person; 
generate, based on the two-dimensional image data, first two-dimensional positions of body parts represented by the first person (Shimada teaches “first predict[ing] heatmaps of 2D joints” in 4.1 Stage I: Kinematic Pose Estimation; the heatmaps are interpreted as 2D positions of the body parts); 
generate, based on the two-dimensional image data, (See above citation); 
generate, using a deep neural network, based on the first two-dimensional positions, a first root-relative three-dimensional pose regression of the body parts represented by the first person (Shimada teaches “first predict[ing] … root-relative location maps of joint positions in 3D with a specially tailored fully convolutional neural network using a ResNet” in 4.1 Stage I: Kinematic Pose Estimation; See also Section 4 Method wherein the 3D location map is described as a 3D location map regression for each body joint); 
generate, using the deep neural network, based on the (See above citation); 
identify, based on the first two-dimensional positions and the first root-relative three-dimensional pose regression, contact between a ground plane and a foot of the first person (See FIG. 3 which shows that the foot contact is based on the 2D heat maps and 3D location maps. Additionally, Shimada teaches “The second stage performs foot contact and motion state detection, which uses 2D joint detections K𝑡 to classify the poses reconstructed so far into stationary and non-stationary” in Section 4 Method); 
identify, based on the (See above citation); 
generate, a first absolute three-dimensional position of the contact between the ground plane and the foot of the first person (Shimada teaches “J ∈ R 6𝑁𝑐×𝑚 is a contact Jacobi matrix which relates the external forces to joint coordinates” in Section 3 Body Model and Preliminaries, and calculating the contact position in the calculation of the Ground Reaction Force Estimation in Section 4.3.4; See also FIG. 4 which depicts a Base of Support which is “an area on the ground bounded by the foot contact points” as shown in Section 4.2 Stage II: Foot Contact and Motion State Detection; This foot contact position which is based on a relative relationship between the foot and the ground is interpreted as equivalent to the absolute position of the contact); 
generate, a (See above citation); 
generate, based on the first absolute three-dimensional position, a first three-dimensional pose of the body parts represented by the first person (Shimada teaches “when contact is detected (Sec. 4.3.3), we integrate the estimated ground reaction force (Sec. 4.3.4) in the equation of motion. In addition, we introduce contact constraints to prevent foot-floor penetration and foot sliding when contacts are detected” … “and stage III returns the 𝑛-th output from v) as the final character pose q” in Section 4.3.5 Physics-Based Pose Optimization; here, the final character pose is based on the GRF (which involves the foot (absolute) position) and contact between the foot and the ground-plane) (See also Huelsdunk’s teaching of the foot contact position); and 
generate, based on the (See above citation).
Shimada fails to teach the above limitations in the context of a second person. 
However, the process taught above by Shimada regarding determining the generation of the three-dimensional pose can be combined with Huelsdunk to teach the above process in the context of determining poses of multiple subjects (Huelsdunk teaches that if the image represents a scene including plural persons, relative 3D joint locations may be estimated for each of the plural persons in the scene” in para. [0110] (See also para. [0142]), wherein the “intersect[ion of] the ground plane at an absolute 3D location L, which may serve as an estimate of the absolute foot location” as shown in para [0142]).
Shimada and Huelsdunk are both considered to be analogous to the claimed invention because they are in the same field of determining human poses through foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada to incorporate the teachings of Huelsdunk  and include the above process in the context of determining poses of multiple subjects. The motivation for doing so would have been to “improve[] accuracy by gathering multiple estimates over time”, as suggested by Huelsdunk in para. [0307]. See also para. [0355] and [0149]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada with Huelsdunk to obtain the invention specified in claim 10. 

Regarding claim 15, Shimada and Huelsdunk teach the system of claim 10, wherein the at least one processor is further configured to:
determine a difference between the first three-dimensional pose and the first root- relative three-dimensional pose regression (Shimada teaches determining a difference between the 3D joint predictions (equivalent to the first root-relative three-dimensional pose regression) and the 3D pose, and constraining the 3D pose to be close to the 3D joint predictions as shown as 4.1); and 
generate, based on the difference, a third three-dimensional pose of the body parts represented by the first person (Shimada teaches constraining the 3D pose to be close to the 3D joint predictions in Section 4.1, which is equivalent to generating a third three-dimensional pose of the body parts represented by the first person).  

Claims 2, 3, 5, 6, 11, 12, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Shimada et al. (“PhysCap: physically plausible monocular 3D motion capture in real time”), hereinafter Shimada, in view of Huelsdunk et al. (U.S. Publication No. 2021/0192783 A1), hereinafter Huelsdunk and Wang et al. (U.S. Publication No. 2018/0075593), hereinafter Wang.
Regarding claim 2, Shimada and Huelsdunk teach the method of claim 1, further comprising.
Shimada further teaches identifying two-dimensional positions of the ground plane based on the two-dimensional image data (Shimada teaches “a new CNN that detects heel and forefoot placement on the ground from estimated 2D keypoints in images” in the Introduction; the heel and forefoot placement on the ground is interpreted as equivalent to the 2D positions of the ground plane) and using a Jacobi matrix and a joint space inertia matrix (Section 3 Body Model and Preliminaries) to generate a Ground Reaction Force Estimation (Section 4.3.4 Ground Reaction Force (GRF) Estimation).
Huelsdunk further teaches a first and second absolute three-dimensional position (see claim 1).
Shimada and Huelsdunk fail to teach generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first absolute three-dimensional position is further based on the homographic matrix, and wherein generating the second absolute three-dimensional position is further based on the homographic matrix.  
	However, Wang teaches generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane (Wang teaches that “the video content analysis system can then use the estimated location of the ground plane to develop a homographic matrix that can be used in a homographic transformation to map 2-D coordinates in a video frame to 3-D points in the real world” in para. [0099]; see also para. [0142]), wherein generating the first absolute three-dimensional position is further based on the homographic matrix, and wherein generating the second absolute three-dimensional position is further based on the homographic matrix (Wang teaches generating real word coordinates of the bottom (feet) of a person based on the homographic matrix by performing a homographic transform in para. [0148]; these coordinates are interpreted as the absolute three-dimensional position as they are relative to the camera; see para. [0137] and [0116] regarding Wang’s teaching of the above embodiment being applied to multiple persons, therefore generating at least a first and second absolute three-dimensional position).
Shimada, Huelsdunk, and Wang are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Wang  and include generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first absolute three-dimensional position is further based on the homographic matrix, and wherein generating the second absolute three-dimensional position is further based on the homographic matrix”. The motivation for doing so would have been to use the homographic matrix “to scale, rotate, translate, skew, or de-skew an image”, as suggested by Wang in para. [0183]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Wang to obtain the invention specified in claim 2. 

Regarding claim 3, Shimada, Huelsdunk, and Wang teach the method of claim 2, further comprising: 
generating extrinsic parameters of the camera based on the homographic matrix and a focal length of the camera (Wang teaches using the homographic matrix to test the accuracy of the extrinsic parameters in para. [0145], wherein the camera parameters include the focal length of the camera in para. [0092]), 
wherein generating the first three-dimensional pose is further based on the extrinsic parameters (Applicant’s specification discusses this limitation on page 7, lines 17-25 wherein “the foot-floor contact detection 208 may transform the root-relative 3D joint positions from the camera coordinate system into the real-world system by using the calibrated extrinsic parameter R”. Similarly, Wang teaches “produc[ing] 3-D real-world coordinates (Xb, Yb), of the bottom 1136 (that is, the location of the feet) of the person that is associated with the blob 1102” in para. [0148] based on a homographic matrix which is based on the estimated extrinsic parameters; since this process is equivalent to the process outlined in the Specification, and the 3-D coordinates as taught by Wang are further used to determine an estimated pose to compare with the estimated/detected height as shown in para. [0160]);
wherein generating the second three-dimensional pose is further based on the extrinsic parameters (Applicant’s specification discusses this limitation on page 7, lines 17-25 wherein “the foot-floor contact detection 208 may transform the root-relative 3D joint positions from the camera coordinate system into the real-world system by using the calibrated extrinsic parameter R”. Similarly, Wang teaches “produc[ing] 3-D real-world coordinates (Xb, Yb), of the bottom 1136 (that is, the location of the feet) of the person that is associated with the blob 1102” in para. [0148] based on a homographic matrix which is based on the estimated extrinsic parameters; since this process is equivalent to the process outlined in the Specification, and the 3-D coordinates as taught by Wang are further used to determine an estimated pose to compare with the estimated/detected height as shown in para. [0160]; see para. [0137] and [0116] regarding Wang’s teaching of the above embodiment being applied to multiple persons, therefore generating at least a first and second absolute three-dimensional position)
wherein the extrinsic parameters are indicative of a rotation matrix and a translation vector for the camera (Wang teaches the rotation matrix and translation vector in para. [0125]-[0126]; see also para. [0166] which teaches that the extrinsic parameters are indicative of the rotation and translation values).  
Shimada, Huelsdunk, and Wang are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Wang  and include “generating extrinsic parameters of the camera based on the homographic matrix and a focal length of the camera, wherein generating the first three-dimensional pose is further based on the extrinsic parameters; wherein generating the second three-dimensional pose is further based on the extrinsic parameters wherein the extrinsic parameters are indicative of a rotation matrix and a translation vector for the camera”. The motivation for doing so would have been to “improve the efficiency of the process 1200 by reducing the number of extrinsic parameters considered each time the process 1200 looks for the lowest cost values” so that “the system can determine cost values for these new extrinsic parameters. At step 1212, the system can again identify the extrinsic parameters with the lowest cost values” as shown in para. [0168] and para. [0170] of Wang, respectively. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Wang to obtain the invention specified in claim 3. 

Regarding claim 5, Shimada and Huelsdunk teach the method of claim 1, further comprising: 
identifying a two-dimensional position of the contact between the ground plane and the foot of the first person (Shimada teaches “a new CNN that detects heel and forefoot placement on the ground from estimated 2D keypoints in images” in the Introduction; the heel and forefoot placement on the ground is interpreted as equivalent to the 2D positions of the ground plane); and 
generating the first three-dimensional pose based on an absolute position (Shimada teaches “when contact is detected (Sec. 4.3.3), we integrate the estimated ground reaction force (Sec. 4.3.4) in the equation of motion. In addition, we introduce contact constraints to prevent foot-floor penetration and foot sliding when contacts are detected” … “and stage III returns the 𝑛-th output from v) as the final character pose q” in Section 4.3.5 Physics-Based Pose Optimization; here, the final character pose is based on the GRF (which involves the foot (absolute) position) and contact between the foot and the ground-plane).
Shimada and Huelsdunk fail to teach identifying two-dimensional positions of the ground plane based on the two-dimensional image data; and generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first three-dimensional pose is further based on mapping the two- dimensional position of the contact between the ground plane and the foot of the first person to the first absolute three-dimensional position using the homographic matrix.
However, Wang teaches identifying two-dimensional positions of the ground plane based on the two-dimensional image data (Wang teaches "the system can further randomly select points on an assumed ground plane. The system can then obtain 2-D image coordinates for the randomly selected points using the pinhole camera model and the estimated extrinsic parameters” in para. [0055]; see also para. [0107]); and 
generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane (Wang teaches “determining, using the two-dimensional coordinates and the ground plane, values for a homographic matrix, wherein a homographic transformation using the homographic matrix provides a mapping from two-dimensional coordinates in the video frame to three dimensional real-world points” in para. [0183]),  
wherein generating the first three-dimensional pose is further based on mapping the two- dimensional position of the contact between the ground plane and the foot of the first person to the first absolute three-dimensional position using the homographic matrix (Wang teaches “produc[ing] 3-D real-world coordinates (Xb, Yb), of the bottom 1136 (that is, the location of the feet) of the person that is associated with the blob 1102” in para. [0148] based on a homographic matrix which is based on the estimated extrinsic parameters; since the 3-D coordinates as taught by Wang are further used to determine an estimated pose to compare with the estimated/detected height as shown in para. [0160], it can be combined with the teachings of Shimada’s teaching of generating the first three-dimensional pose based on an absolute position as shown above to teach the above limitation).  
Shimada, Huelsdunk, and Wang are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Wang  and include “identifying two-dimensional positions of the ground plane based on the two-dimensional image data; and generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first three-dimensional pose is further based on mapping the two- dimensional position of the contact between the ground plane and the foot of the first person to the first absolute three-dimensional position using the homographic matrix”. The motivation for doing so would have been to use the homographic matrix “to scale, rotate, translate, skew, or de-skew an image”, as suggested by Wang in para. [0183]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Wang to obtain the invention specified in claim 5. 

Regarding claim 6, Shimada, Huelsdunk, and Wang teach the method of claim 5, further comprising: 
determining a difference between the first three-dimensional pose and the first root- relative three-dimensional pose regression (Shimada teaches determining a difference between the 3D joint predictions (equivalent to the first root-relative three-dimensional pose regression) and the 3D pose, and constraining the 3D pose to be close to the 3D joint predictions as shown as 4.1); and 
generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person (Shimada teaches constraining the 3D pose to be close to the 3D joint predictions in Section 4.1, which is equivalent to generating a third three-dimensional pose of the body parts represented by the first person).  

Regarding claim 11, Shimada and Huelsdunk teach the system of claim 10, wherein the at least one processor is further configured to:
identify two-dimensional positions of the ground plane based on the two-dimensional image data (Shimada teaches “a new CNN that detects heel and forefoot placement on the ground from estimated 2D keypoints in images” in the Introduction; the heel and forefoot placement on the ground is interpreted as equivalent to the 2D positions of the ground plane) and using a Jacobi matrix and a joint space inertia matrix (Section 3 Body Model and Preliminaries) to generate a Ground Reaction Force Estimation (Section 4.3.4 Ground Reaction Force (GRF) Estimation).
Huelsdunk further teaches a first and second absolute three-dimensional position (see claim 10).
Shimada and Huelsdunk fail to teach generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first absolute three-dimensional position is further based on the homographic matrix, and wherein generating the second absolute three-dimensional position is further based on the homographic matrix.  
	However, Wang teaches generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane (Wang teaches that “the video content analysis system can then use the estimated location of the ground plane to develop a homographic matrix that can be used in a homographic transformation to map 2-D coordinates in a video frame to 3-D points in the real world” in para. [0099]; see also para. [0142]), 
wherein generating the first absolute three-dimensional position is further based on the homographic matrix, and wherein generating the second absolute three-dimensional position is further based on the homographic matrix (Wang teaches generating real word coordinates of the bottom (feet) of a person based on the homographic matrix by performing a homographic transform in para. [0148]; these coordinates are interpreted as the absolute three-dimensional position as they are relative to the camera; see para. [0137] and [0116] regarding Wang’s teaching of the above embodiment being applied to multiple persons, therefore generating at least a first and second absolute three-dimensional position).
Shimada, Huelsdunk, and Wang are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Wang  and include generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first absolute three-dimensional position is further based on the homographic matrix, and wherein generating the second absolute three-dimensional position is further based on the homographic matrix”. The motivation for doing so would have been use the homographic matrix “to scale, rotate, translate, skew, or de-skew an image”, as suggested by Wang in para. [0145]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Wang to obtain the invention specified in claim 11. 

Regarding claim 12, Shimada and Huelsdunk teach the system of claim 10.
	Shimada further teaches the processor (See section 5.1).
	Shimada and Huelsdunk fail to teach generating a homographic matrix associated with mapping the first two-dimensional positions to three-dimensional positions, wherein to generate the first absolute three-dimensional position is further based on the homographic matrix, and wherein to generate the second absolute three-dimensional position is further based on the homographic matrix.  
generating a homographic matrix associated with mapping the first two-dimensional positions to three-dimensional positions (Wang teaches that “the video content analysis system can then use the estimated location of the ground plane to develop a homographic matrix that can be used in a homographic transformation to map 2-D coordinates in a video frame to 3-D points in the real world” in para. [0099]; see also para. [0142]), 
wherein to generate the first absolute three-dimensional position is further based on the homographic matrix (Wang teaches generating real word coordinates of the bottom (feet) of a person based on the homographic matrix by performing a homographic transform in para. [0148]; these coordinates are interpreted as the absolute three-dimensional position as they are relative to the camera; see para. [0137] and [0116] regarding Wang’s teaching of the above embodiment being applied to multiple persons, therefore generating at least a first and second absolute three-dimensional position), and
wherein to generate the second absolute three-dimensional position is further based on the homographic matrix (Wang teaches generating real word coordinates of the bottom (feet) of a person based on the homographic matrix by performing a homographic transform in para. [0148]; these coordinates are interpreted as the absolute three-dimensional position as they are relative to the camera; see para. [0137] and [0116] regarding Wang’s teaching of the above embodiment being applied to multiple persons, therefore generating at least a first and second absolute three-dimensional position).
Shimada, Huelsdunk, and Wang are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Wang  and include to “generate a homographic matrix associated with mapping the first two-dimensional positions to three-dimensional positions, wherein to generate the first absolute three-dimensional position is further based on the homographic matrix, and wherein to generate the second absolute three-dimensional position is further based on the homographic matrix”. The motivation for doing so would have been “to scale, rotate, translate, skew, or de-skew an image”, as suggested by Wang in para. [0183]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Wang to obtain the invention specified in claim 12. 

Regarding claim 14, Shimada and Huelsdunk teach the system of claim 10, wherein the at least one processor is further configured to:
identify a two-dimensional position of the contact between the ground plane and the foot of the first person (Shimada teaches “a new CNN that detects heel and forefoot placement on the ground from estimated 2D keypoints in images” in the Introduction; the heel and forefoot placement on the ground is interpreted as equivalent to the 2D positions of the ground plane); and 
generate the first three-dimensional pose based on an absolute position (Shimada teaches “when contact is detected (Sec. 4.3.3), we integrate the estimated ground reaction force (Sec. 4.3.4) in the equation of motion. In addition, we introduce contact constraints to prevent foot-floor penetration and foot sliding when contacts are detected” … “and stage III returns the 𝑛-th output from v) as the final character pose q” in Section 4.3.5 Physics-Based Pose Optimization; here, the final character pose is based on the GRF (which involves the foot (absolute) position) and contact between the foot and the ground-plane).
Shimada and Huelsdunk fail to teach identifying two-dimensional positions of the ground plane based on the two-dimensional image data; and generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first three-dimensional pose is further based on mapping the two- dimensional position of the contact between the ground plane and the foot of the first person to the first absolute three-dimensional position using the homographic matrix.
However, Wang teaches identifying two-dimensional positions of the ground plane based on the two-dimensional image data (Wang teaches "the system can further randomly select points on an assumed ground plane. The system can then obtain 2-D image coordinates for the randomly selected points using the pinhole camera model and the estimated extrinsic parameters” in para. [0055]; see also para. [0107]); and 
generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane (Wang teaches “determining, using the two-dimensional coordinates and the ground plane, values for a homographic matrix, wherein a homographic transformation using the homographic matrix provides a mapping from two-dimensional coordinates in the video frame to three dimensional real-world points” in para. [0183]),  
wherein generating the first three-dimensional pose is further based on mapping the two- dimensional position of the contact between the ground plane and the foot of the first person to the first absolute three-dimensional position using the homographic matrix (Wang teaches “produc[ing] 3-D real-world coordinates (Xb, Yb), of the bottom 1136 (that is, the location of the feet) of the person that is associated with the blob 1102” in para. [0148] based on a homographic matrix which is based on the estimated extrinsic parameters; since the 3-D coordinates as taught by Wang are further used to determine an estimated pose to compare with the estimated/detected height as shown in para. [0160], it can be combined with the teachings of Shimada’s teaching of generating the first three-dimensional pose based on an absolute position as shown above to teach the above limitation).  
Shimada, Huelsdunk, and Wang are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Wang  and include “identifying two-dimensional positions of the ground plane based on the two-dimensional image data; and generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane, wherein generating the first three-dimensional pose is further based on mapping the two- dimensional position of the contact between the ground plane and the foot of the first person to the first absolute three-dimensional position using the homographic matrix”. The motivation for doing so would have been to use the homographic matrix “to scale, rotate, translate, skew, or de-skew an image”, as suggested by Wang in para. [0183]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Wang to obtain the invention specified in claim 14. 

Claims 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Shimada et al. (“PhysCap: physically plausible monocular 3D motion capture in real time”), hereinafter Shimada, in view of Huelsdunk et al. (U.S. Publication No. 2021/0192783 A1), hereinafter Huelsdunk and Zhao (CN 113033369 A, see English translation for citations).
Regarding claim 4, Shimada and Huelsdunk teach the method of claim 1, further comprising:
identifying a first two-dimensional position of the contact between the ground plane and the foot of the first person (Shimada teaches a foot contact point  (BoS) in section 4.2 and FIG. 4 which depicts the contact between the ground plane and the foot of a person in 2d) (Huelsdunk teaches “the 2D location may be the left foot's 2D location at the moment a person comes into contact with (e.g. lands on, perhaps following a jump) a known ground plane” in para. [0143]; see para. [0110] in which the above process can occur for each of plural persons in a scene); 
identifying a second two-dimensional position of the contact between the ground plane and the foot of the second person (Shimada teaches a foot contact point  (BoS) in section 4.2 and FIG. 4 which depicts the contact between the ground plane and the foot of a person in 2d) (Huelsdunk teaches “the 2D location may be the left foot's 2D location at the moment a person comes into contact with (e.g. lands on, perhaps following a jump) a known ground plane” in para. [0143]; see para. [0110] in which the above process can occur for each of plural persons in a scene, which implies there exists at least a second 2d position of contact between the ground plane and the foot of a second person). Similar motivations as applied to claim 1 can be applied here in regards to the combination of Shimada in view of Huelsdunk.
Shimada further teaches “when the velocity of 3D root is lower than a threshold 𝜑𝑣 , we classify the pose as stationary, and non-stationary otherwise” in section 4.2. 
Shimada and Huelsdunk fail to teach determining, based on the first two-dimensional position, that a first velocity of the foot of the first person is below a threshold velocity; and determining, based on the second two-dimensional position, that a second velocity of the foot of the second person is below the threshold velocity, wherein identifying the contact between the ground plane and the foot of the first person is further based on the first velocity of the foot of the first person being below the threshold velocity, and wherein identifying the contact between the ground plane and the foot of the second person is further based on the second velocity of the foot of the second person being below the threshold velocity.  
	However, Zhao teaches determining, based on the first two-dimensional position, that a first velocity of the foot of the first person is below a threshold velocity (Zhao teaches determining the velocity of a foot and, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]); and 
determining, based on the second two-dimensional position, that a second velocity of the foot of the second person is below the threshold velocity (Zhao teaches determining the velocity of a foot and, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]) (Zhao’s teaching of determining velocity can be combined with Huelsdunk’s teaching of an image which represents a scene including plural persons, relative 3D joint locations may be estimated for each of the plural persons in the scene” in para. [0110] to teach the second person), 
wherein identifying the contact between the ground plane and the foot of the first person is further based on the first velocity of the foot of the first person being below the threshold velocity (Zhao teaches that, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]), and 
wherein identifying the contact between the ground plane and the foot of the second person is further based on the second velocity of the foot of the second person being below the threshold velocity (Zhao teaches that, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]) (Zhao’s teaching of determining velocity can be combined with Huelsdunk’s teaching of an image which represents a scene including plural persons, relative 3D joint locations may be estimated for each of the plural persons in the scene” in para. [0110] to teach the second person. Similar motivations as applied to claim 1 can be applied here). 
Shimada, Huelsdunk, and Zhao are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Zhao and include “determining, based on the first two-dimensional position, that a first velocity of the foot of the first person is below a threshold velocity; and determining, based on the second two-dimensional position, that a second velocity of the foot of the second person is below the threshold velocity, wherein identifying the contact between the ground plane and the foot of the first person is further based on the first velocity of the foot of the first person being below the threshold velocity, and wherein identifying the contact between the ground plane and the foot of the second person is further based on the second velocity of the foot of the second person being below the threshold velocity”. The motivation for doing so would have been “to solve the problem of foot slip of a virtual character and improve the viewing experience of a user”, as suggested by Zhao in para. [0079]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Zhao to obtain the invention specified in claim 4. 

Regarding claim 13, Shimada and Huelsdunk teach the system of claim 10, wherein the at least one processor is further configured to:
Identify a first two-dimensional position of the contact between the ground plane and the foot of the first person (Shimada teaches a foot contact point  (BoS) in section 4.2 and FIG. 4 which depicts the contact between the ground plane and the foot of a person in 2d) (Huelsdunk teaches “the 2D location may be the left foot's 2D location at the moment a person comes into contact with (e.g. lands on, perhaps following a jump) a known ground plane” in para. [0143]; see para. [0110] in which the above process can occur for each of plural persons in a scene); 
identify a second two-dimensional position of the contact between the ground plane and the foot of the second person (Shimada teaches a foot contact point  (BoS) in section 4.2 and FIG. 4 which depicts the contact between the ground plane and the foot of a person in 2d) (Huelsdunk teaches “the 2D location may be the left foot's 2D location at the moment a person comes into contact with (e.g. lands on, perhaps following a jump) a known ground plane” in para. [0143]; see para. [0110] in which the above process can occur for each of plural persons in a scene, which implies there exists at least a second 2d position of contact between the ground plane and the foot of a second person). Similar motivations as applied to claim 1 can be applied here in regards to the combination of Shimada in view of Huelsdunk.
Shimada further teaches “when the velocity of 3D root is lower than a threshold 𝜑𝑣 , we classify the pose as stationary, and non-stationary otherwise” in section 4.2. 
Shimada and Huelsdunk fail to teach determining, based on the first two-dimensional position, that a first velocity of the foot of the first person is below a threshold velocity; and determining, based on the second two-dimensional position, that a second velocity of the foot of the second person is below the threshold velocity, wherein identifying the contact between the ground plane and the foot of the first person is further based on the first velocity of the foot of the first person being below the threshold velocity, and wherein identifying the contact between the ground plane and the foot of the second person is further based on the second velocity of the foot of the second person being below the threshold velocity.  
	However, Zhao teaches determining, based on the first two-dimensional position, that a first velocity of the foot of the first person is below a threshold velocity (Zhao teaches determining the velocity of a foot and, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]); and 
determining, based on the second two-dimensional position, that a second velocity of the foot of the second person is below the threshold velocity (Zhao teaches determining the velocity of a foot and, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]) (Zhao’s teaching of determining velocity can be combined with Huelsdunk’s teaching of an image which represents a scene including plural persons, relative 3D joint locations may be estimated for each of the plural persons in the scene” in para. [0110] to teach the second person), 
wherein identifying the contact between the ground plane and the foot of the first person is further based on the first velocity of the foot of the first person being below the threshold velocity (Zhao teaches that, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]), and 
wherein identifying the contact between the ground plane and the foot of the second person is further based on the second velocity of the foot of the second person being below the threshold velocity (Zhao teaches that, when “the foot speed of the side foot is smaller than a speed threshold value, determining that the side foot is in contact with the ground” in para. [0155]) (Zhao’s teaching of determining velocity can be combined with Huelsdunk’s teaching of an image which represents a scene including plural persons, relative 3D joint locations may be estimated for each of the plural persons in the scene” in para. [0110] to teach the second person. Similar motivations as applied to claim 1 can be applied here).
Shimada, Huelsdunk, and Zhao are all considered to be analogous to the claimed invention because they are in the same field of analyzing foot contact positions with a ground-plane through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Zhao and include “determining, based on the first two-dimensional position, that a first velocity of the foot of the first person is below a threshold velocity; and determining, based on the second two-dimensional position, that a second velocity of the foot of the second person is below the threshold velocity, wherein identifying the contact between the ground plane and the foot of the first person is further based on the first velocity of the foot of the first person being below the threshold velocity, and wherein identifying the contact between the ground plane and the foot of the second person is further based on the second velocity of the foot of the second person being below the threshold velocity”. The motivation for doing so would have been “to solve the problem of foot slip of a virtual character and improve the viewing experience of a user”, as suggested by Zhao in para. [0079]. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Zhao to obtain the invention specified in claim 13. 

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Shimada et al. (“PhysCap: physically plausible monocular 3D motion capture in real time”), hereinafter Shimada, in view of Huelsdunk et al. (U.S. Publication No. 2021/0192783 A1), hereinafter Huelsdunk, Wang et al. (U.S. Publication No. 2018/0075593), hereinafter Wang, and Guiges et al. (U.S. Patent No. 10839203), hereinafter Guiges.
Regarding claim 7, Shimada, Huelsdunk, and Wang teach the method of claim 1.
	Shimada further teaches the first image comprising two-dimensional image data (see claim 1), a first and second image of a first person (see FIG. 7), and generating multiple 3D poses of a first person based on minimizing error (See Section 4.3.1 Pose Correction).
	Huelsdunk additionally teaches a first and second person in two-dimensional image data (see claim 1), wherein there exists first and second images of a scene as shown in para. [0038] and [0307].
	Shimada, Huelsdunk, and Wang fail to teach determining a difference between a first image comprising the two-dimensional image data and a second image comprising second two-dimensional image data representing the first person and the second person; and generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person.
However, Guigues teaches determining a difference between a first image comprising the two-dimensional image data and a second image comprising second two-dimensional image data representing the first person and the second person (Guiges teaches that a “scene 120 may be any open or enclosed environment or space in which any number of actors (e.g., humans, other animals or machines) may execute one or more poses” in col. 3, lines 14-18 which is interpreted to at least a first and second person existing in the scene; Guiges additionally teaches “track[ing] one or more actors within a scene, and the poses or gestures executed by such actors, using two-dimensional images captured by two or more imaging devices including all or portions of a scene within a common field of view” in col. 8, lines 46-51; Guiges lastly teaches comparing the two-dimensional image data wherein “the detection 160-1-2L of a head in the image frame 130-2L captured at time t2 may be probabilistically compared to the detection 160-1-3L of a head in the image frame 130-3L captured at time t3 in order to determine whether the detections 160-1-2L, 160-1-3L correspond to the same head, and the edges between such detections 160-1-2L, 160-1-3L may be contracted accordingly, i.e., by determining that the probabilities corresponding to such edges are sufficiently high” in col. 7, lines 7-40); and 
generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person (Guiges teaches that “the articulated model 165-M of the actor 180 may be defined as a vector M165(t) that is representative of the smoothed three-dimensional motion of the various body parts P1(t), P2(t) . . . P16(t) that are merged together and best fit the respective two-dimensional detections of the respective body parts by the imaging devices 125-1, 125-2” in col. 8, lines 35-45; Guiges additionally teaches that “the respective probabilities of edges between nodes may be reevaluated based on any information that may be newly obtained, e.g., by the evaluation of subsequently or concurrently captured image frames” in col. 24, lines 25-48. This process inherently involves generating at least a third three-dimensional pose of a first person).  
Shimada, Huelsdunk, Wang, and Guiges are all considered to be analogous to the claimed invention because they are in the same field of analyzing poses of persons through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk and Wang) to incorporate the teachings of Guiges and include “determining a difference between a first image comprising the two-dimensional image data and a second image comprising second two-dimensional image data representing the first person and the second person; and generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person”. The motivation for doing so would have been “to identify the most accurate probabilities associated with edges extending between pairs of other nodes”, as suggested by Guiges in col. 24, lines 39-48. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada, Huelsdunk, and Wang with Guiges to obtain the invention specified in claim 7. 

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Shimada et al. (“PhysCap: physically plausible monocular 3D motion capture in real time”), hereinafter Shimada, in view of Huelsdunk et al. (U.S. Publication No. 2021/0192783 A1), hereinafter Huelsdunk and Guiges et al. (U.S. Patent No. 10839203), hereinafter Guiges.
Regarding claim 16, Shimada and Huelsdunk teach the system of claim 10.
	Shimada further teaches a processor configured to carry out the method (see Section 5.1 Implementation) the first image comprising two-dimensional image data (see claim 10), a first and second image of a first person (see FIG. 7), and generating multiple 3D poses of a first person based on minimizing error (See Section 4.3.1 Pose Correction).
	Huelsdunk additionally teaches a first and second person in two-dimensional image data (see claim 10), wherein there exists first and second images of a scene as shown in para. [0038] and [0307].
	Shimada and Huelsdunk fail to teach determining a difference between a first image comprising the two-dimensional image data and a second image comprising second two-dimensional image data representing the first person and the second person; and generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person.
However, Guigues teaches determining a difference between a first image comprising the two-dimensional image data and a second image comprising second two-dimensional image data representing the first person and the second person (Guiges teaches that a “scene 120 may be any open or enclosed environment or space in which any number of actors (e.g., humans, other animals or machines) may execute one or more poses” in col. 3, lines 14-18 which is interpreted to at least a first and second person existing in the scene; Guiges additionally teaches “track[ing] one or more actors within a scene, and the poses or gestures executed by such actors, using two-dimensional images captured by two or more imaging devices including all or portions of a scene within a common field of view” in col. 8, lines 46-51; Guiges lastly teaches comparing the two-dimensional image data wherein “the detection 160-1-2L of a head in the image frame 130-2L captured at time t2 may be probabilistically compared to the detection 160-1-3L of a head in the image frame 130-3L captured at time t3 in order to determine whether the detections 160-1-2L, 160-1-3L correspond to the same head, and the edges between such detections 160-1-2L, 160-1-3L may be contracted accordingly, i.e., by determining that the probabilities corresponding to such edges are sufficiently high” in col. 7, lines 7-40); and 
generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person (Guiges teaches that “the articulated model 165-M of the actor 180 may be defined as a vector M165(t) that is representative of the smoothed three-dimensional motion of the various body parts P1(t), P2(t) . . . P16(t) that are merged together and best fit the respective two-dimensional detections of the respective body parts by the imaging devices 125-1, 125-2” in col. 8, lines 35-45; Guiges additionally teaches that “the respective probabilities of edges between nodes may be reevaluated based on any information that may be newly obtained, e.g., by the evaluation of subsequently or concurrently captured image frames” in col. 24, lines 25-48. This process inherently involves generating at least a third three-dimensional pose of a first person).  
Shimada, Huelsdunk, and Guiges are all considered to be analogous to the claimed invention because they are in the same field of analyzing poses of persons through image analysis. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the teachings of Shimada (as modified by Huelsdunk) to incorporate the teachings of Guiges and include “determining a difference between a first image comprising the two-dimensional image data and a second image comprising second two-dimensional image data representing the first person and the second person; and generating, based on the difference, a third three-dimensional pose of the body parts represented by the first person”. The motivation for doing so would have been “to identify the most accurate probabilities associated with edges extending between pairs of other nodes”, as suggested by Guiges in col. 24, lines 39-48. Therefore, it would have been obvious to one of ordinary skill at the time the invention was filed to combine Shimada and Huelsdunk with Guiges to obtain the invention specified in claim 16. 

Allowable Subject Matter
Claims 8, 9, 17, and 18  are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
Claims 19-20 are allowed.
The following is a statement of reasons for the indication of allowable subject matter.
 The best prior art of record is Shimada, Huelsdunk, Wang, Zhao, and Guiges. Prior art applied alone or in combination with fails to anticipate or render obvious claims 8, 9, and 17-20.
Claim 8
Regarding claim 8, Shimada and Huelsdunk teach the method of claim 1, further comprising:
identifying a two-dimensional position of the contact between the ground plane and the foot of the first person.
Shimada further teaches generating a three-dimensional root position of the foot of the first person.
Wang further teaches identifying two-dimensional positions of the ground plane based on the two-dimensional image data; generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane; generating a three-dimensional root position of the foot of the first person based on the homographic matrix.
However, neither Shimada, nor Huelsdunk, nor Wang, nor Zhao, nor Guiges, nor a combination teaches determining a difference between the first absolute three-dimensional position and the three-dimensional root position of the foot of the first person, wherein generating the first absolute three-dimensional position is based on the difference.
Similar analysis can be applied to corresponding claim 17.

Claim 9
Regarding claim 9, Shimada and Huelsdunk teach the method of claim 1, wherein a first image and a second image comprise the two-dimensional image data, the method comprising:
Wang further teaches identifying two-dimensional positions of the ground plane based on the two-dimensional image data; generating a homographic matrix associated with mapping the two-dimensional positions of the ground plane to three-dimensional positions of the ground plane; generating extrinsic parameters of the camera based on the homographic matrix and a focal length of the camera.
However, neither Shimada, nor Huelsdunk, nor Wang, nor Zhao, nor Guiges, nor a combination teaches determining, based on the extrinsic parameters, a difference between the first absolute three-dimensional position and the first two-dimensional positions, wherein generating the first absolute three-dimensional position is based on the difference, and wherein the extrinsic parameters are indicative of a rotation matrix and a translation vector for the camera.
Similar analysis can be applied to corresponding claim 18.

Claim 19
Regarding claim 19, Shimada and Huelsdunk teach an apparatus for real-time three-dimensional human pose tracking using two- dimensional image data. Please see the mapping of claim 1 regarding additional teachings of Shimada in view of Huelsdunk in the context of this claim, and the claim interpretation section regarding the sections of the specification being read into the 112(f) limitations of this claim.
Neither Shimada, nor Huelsdunk, nor Wang, nor Zhao, nor Guiges, nor a combination, teaches the corresponding algorithm of the 112(f) limitations in the above claim which can be found on page 5, line 6 – page 8, line 31 of the applicant’s specification.
Claim 20 includes allowable subject matter by virtue of being dependent upon claim 19.\

***Please note that the limitations noted in the claim interpretation section above regarding claims 19 and 20 are being interpreted under 112(f) as a computer-implemented means-plus-function limitation, wherein the corresponding algorithm of these limitations can be found on page 5, line 6 – page 8, line 31. See the claim interpretation regarding the specific structure/algorithm applied to each limitation interpreted under 112(f). In this instance, the structure corresponding to a 35 U.S.C. 112(f) claim limitation for a computer-implemented function must include the algorithm needed to transform the general purpose computer or microprocessor disclosed in the specification. See MPEP 2181(II)(B). The specific information in the specification regarding the algorithm associated with the acquisition unit that makes this limitation allowable when analyzed in conjunction with the rest of the claim elements includes, but is not limited to, page 5, line 6 – page 8, line 31 of the applicant’s specification.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to KYLA G ALLEN whose telephone number is (703)756-5315. The examiner can
normally be reached M-F 7:30am - 4:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use
the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
John Villecco can be reached on (571) 272-7319. The fax phone number for the organization where this
application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users. To
file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or
571-272-1000.
/Kyla Guan-Ping Tiao Allen/
Examiner, Art Unit 2661

/JOHN VILLECCO/Supervisory Patent Examiner, Art Unit 2661
Read full office action
Prosecution Timeline

Dec 13, 2023
Application Filed
Jan 14, 2026
Non-Final Rejection — §103
Apr 13, 2026
Applicant Interview (Telephonic)
Apr 13, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

18/060,260
Patent 12597119
OPERATING METHOD OF ELECTRONIC DEVICE INCLUDING PROCESSOR EXECUTING SEMICONDUCTOR LAYOUT SIMULATION MODULE BASED ON MACHINE LEARNING
2y 5m to grant Granted Apr 07, 2026
17/685,863
Patent 12588594
SYSTEM AND METHOD FOR IDENTIFYING LENGTHS OF PARTICLES
2y 5m to grant Granted Mar 31, 2026
17/986,620
Patent 12591963
SYSTEM AND METHOD FOR ENHANCING DEFECT DETECTION IN OPTICAL CHARACTERIZATION SYSTEMS USING A DIGITAL FILTER
2y 5m to grant Granted Mar 31, 2026
18/127,902
Patent 12548152
INTRACRANIAL ARTERY STENOSIS DETECTION METHOD AND SYSTEM
2y 5m to grant Granted Feb 10, 2026
17/986,817
Patent 12541833
ASSESSING IMAGE/VIDEO QUALITY USING AN ONLINE MODEL TO APPROXIMATE SUBJECTIVE QUALITY VALUES
2y 5m to grant Granted Feb 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+17.1%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 53 resolved cases by this examiner. Grant probability derived from career allow rate.