Last updated: April 19, 2026
Application No. 18/064,519
HUMAN POSTURE DETECTION

Non-Final OA §103
Filed
Dec 12, 2022
Examiner
OSIFADE, IDOWU O
Art Unit
2675
Tech Center
2600 — Communications
Assignee
Intel Corporation
OA Round
1 (Non-Final)
Interview Optional

— +12.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 671 resolved cases, 2023–2026
Examiner Intelligence

OSIFADE, IDOWU O View full profile →
Grants 81% — above average
Career Allow Rate
545 granted / 671 resolved
+19.2% vs TC avg
Moderate +12% lift
Without
With
+12.4%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 2m
Avg Prosecution
18 currently pending
Career history
689
Total Applications
across all art units
Statute-Specific Performance

§101
11.7%
-28.3% vs TC avg
§103
59.9%
+19.9% vs TC avg
§102
11.8%
-28.2% vs TC avg
§112
14.0%
-26.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 671 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

DETAILED ACTION
Claims 1 – 21 are pending in this application. Claims 1, 13 and 16 are independent.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1 – 21 are rejected under 35 U.S.C. 103 as being unpatentable over Asikainen, Sami (US-20220296966-A1, hereinafter simply referred to as Sami).

Regarding independent claim 1, Sami teaches:
A non-transitory machine-readable storage medium (e.g., non-transitory computer-usable (e.g., readable, writeable) device of Sami) with instructions stored thereon, the instructions executable by the machine to cause the machine to: receive image data generated by a camera with a first resolution (e.g., camera (e.g., webcam) of Sami) (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"), wherein the camera is provided on a user computing device to capture an image of a user of the user computing device (See at least Sami, ¶ [0051]; FIG. 1B; "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, an infrared sensor, or a combination of one or more of the foregoing sensors. The sensor(s) 109 comprising of one or more cameras may provide a wider field of view (e.g., field of view >120 degrees) for capturing the video of the scene in which user 106 is performing the exercise movement and acquiring depth information (R, G, B, X, Y, Z) from the scene…"); execute a first machine learning model trained to determine a feature set associated with posture of the user from the image data (See at least Sami, ¶ [0077]; FIG. 1B; "…the machine learning engine 206 may train the one or more machine learning models 226 for a variety of machine learning tasks including estimating a pose (e.g., 3D pose (x, y, z) coordinates of keypoints), detecting an object (e.g., barbell, registered user)…detecting a technique or form of the user in performing the exercise movement within acceptable thresholds…detecting a risk of injury, etc…"); receive depth data generated by a time of flight (ToF) sensor (e.g., a time-of-flight camera of Sami) provided on the user computing device (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"), wherein the depth data has a second resolution lower than the first resolution (e.g., depth data having a resolution lower than the first resolution – from a camera (e.g., webcam) is well-known in the art) and is generated contemporaneously with generation of the image data (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"); provide the first feature set (e.g., keypoints of Sami) as a first input and the depth data as a second input to a second machine learning model to generate a second feature set as an output of the second machine learning model (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); and determine a posture of the user from the second feature set (e.g., keypoints of Sami) (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…").
Sami teaches the subject matter of the claimed inventive concept as expressed in the rejections above. However, the teachings are taught in separate embodiments.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sami taught in separate embodiments for the desirable and advantageous purpose of enabling a low latency streaming of data to the personal training backend server 120 for requesting analysis and receiving feedback on the user performing the exercise movement, as discussed in Sami (See ¶ [0074]); thereby, achieving the predictable result of improving the overall efficiency and speed of the system with a reasonable expectation of success while enabling others skilled in the art to best utilize the invention along with various implementations and modifications as are suited to the particular use contemplated.

Regarding independent claim 13, Sami teaches:
A method comprising: receiving two-dimensional image data generated by a camera (e.g., camera (e.g., webcam) of Sami) of a user computing device (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"), wherein the image data comprises an image of a user using the user computing device (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"); applying a first machine learning model to the image data to generate a first feature set (e.g., keypoints of Sami) (See at least Sami, ¶ [0077]; FIG. 1B; "…the machine learning engine 206 may train the one or more machine learning models 226 for a variety of machine learning tasks including estimating a pose (e.g., 3D pose (x, y, z) coordinates of keypoints), detecting an object (e.g., barbell, registered user)…detecting a technique or form of the user in performing the exercise movement within acceptable thresholds…detecting a risk of injury, etc…"), wherein the first feature set identifies features of a pose of the user from the image data (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); receiving depth data generated by a depth sensor of the user computing device (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"), wherein the depth data comprises a grid of depth pixels and is generated contemporaneously with the image data (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); providing the first feature set as a first input and the depth data as a second input to a second machine learning model to generate a second feature set (e.g., keypoints of Sami) as an output of the second model (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); and determining a posture of the user from the second feature set (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…").
Sami teaches the subject matter of the claimed inventive concept as expressed in the rejections above. However, the teachings are taught in separate embodiments.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sami taught in separate embodiments for the desirable and advantageous purpose of enabling a low latency streaming of data to the personal training backend server 120 for requesting analysis and receiving feedback on the user performing the exercise movement, as discussed in Sami (See ¶ [0074]); thereby, achieving the predictable result of improving the overall efficiency and speed of the system with a reasonable expectation of success while enabling others skilled in the art to best utilize the invention along with various implementations and modifications as are suited to the particular use contemplated.

Regarding independent claim 16, Sami teaches:
An apparatus (e.g., interactive personal training devices 108a (FIG. 1) of Sami) comprising: a processor (e.g., processor of Sami); a memory (e.g., memory of Sami); a display (e.g., display of Sami); a camera sensor (e.g., camera (e.g., webcam) of Sami) oriented to face a human viewer of the display (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"); a depth sensor oriented to face the human viewer of the display (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"); a posture detection engine executable by the processor to: receive two-dimensional image data generated by the camera (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"), wherein the image data comprises an image of the human viewer (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"); provide the image data as an input to a first machine learning model to determine a first feature set (e.g., keypoints of Sami) (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"), wherein the first machine learning model is trained to determine a post of a human from two-dimensional images (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); receive depth data generated by the depth sensor contemporaneously with generation of the image data (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"), wherein the depth data comprises one or more depth measurements of the human viewer (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); provide the first feature set as a first input and the depth data as a second input to a second machine learning model to generate a second feature set (e.g., keypoints of Sami) as an output of the second machine learning model (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); determine a posture of the human viewer from the second feature set (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…"); and determine quality of the posture of the human viewer based on the second feature set (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…").
Sami teaches the subject matter of the claimed inventive concept as expressed in the rejections above. However, the teachings are taught in separate embodiments.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Sami taught in separate embodiments for the desirable and advantageous purpose of enabling a low latency streaming of data to the personal training backend server 120 for requesting analysis and receiving feedback on the user performing the exercise movement, as discussed in Sami (See ¶ [0074]); thereby, achieving the predictable result of improving the overall efficiency and speed of the system with a reasonable expectation of success while enabling others skilled in the art to best utilize the invention along with various implementations and modifications as are suited to the particular use contemplated.

Regarding dependent claim 2, Sami teaches:
wherein the image data comprises two-dimensional red-green-blue (RGB) image data (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…").

Regarding dependent claim 3, Sami teaches:
wherein dimensions of the first features set are lower than dimensions of the image data (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…" The Examiner notes that the depth map may provide depth information at a low resolution (e.g., exponentially lower than the (e.g., megapixel level) resolution of the camera/webcam. Thus, the first features set having lower dimensions than dimensions of the image data as claimed).

Regarding dependent claim 4, Sami teaches:
wherein the instructions are further executable to cause the machine to: provide a first version of the image data to a person detection model to detect that a view of the user occupies a subarea of the image data (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…"); generate a cropped version of the image data (See at least Sami, ¶ [0074]; FIG. 1B; "…the data processing engine 204 receives image frames of a scene from a depth sensing camera on the interactive personal training device 108, removes (or crops) non-moving parts in the image frames (e.g., background), and sends the depth information calculated for the foreground object to the personal training backend server 120 for analysis…"), wherein the cropped version of the image data comprises the subarea, wherein the cropped version of the image data is provided as an input to the first machine learning model (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…").

Regarding dependent claim 5, Sami teaches:
wherein the instructions are further executable to cause the machine to: determine a subset of depth pixels of the depth data corresponding to the subarea (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…" Also, see at least ¶ [0047, 0074]); crop the depth data to generate a cropped version of the depth data to comprise the subset of depth pixels, wherein the cropped version of the depth data is provided as the second input to the second machine learning model.

Regarding dependent claim 6, Sami teaches:
wherein the first machine learning model comprises a convolutional neural network (See at least Sami, ¶ [0083]; FIG. 1B; "…the pose estimator 302 receives the RGB image and associated depth map, inputs the received data into a trained convolutional neural network for pose estimation, and generates 3D pose coordinates for one or more keypoints associated with a user. The pose estimator 302 generates a heatmap predicting the probability of the keypoint occurring at each pixel…the pose estimator 302 detects and tracks a static pose in a number of continuous image frames…The pose estimator 302 determines a position, an angle, a distance, and an orientation of the keypoints based on the estimated pose…determines an initial position, a final position, and a relative position of a joint in a sequence of a threshold number of frames. The pose estimator 302 passes the 3D pose data including the determined position, angle, distance, and orientation of the keypoints to other components 304, 306, 308, 310, 312, and 314 in the feedback engine 208 for further analysis…" Also, see at least ¶ [0047, 0074]).

Regarding dependent claim 7, Sami teaches:
wherein the first feature set and the second feature set each define a set of features associated with whether a body part of the user is angled toward or away from the user computing device (See at least Sami, ¶ [0047]; FIG. 1B; "…the personal training application 110a may provide for user interaction, receive a stream of sensor data input in association with a user performing an exercise movement, present information (e.g., an overlay of an exercise movement performed by a personal trainer) to the user via a display (e.g., see FIG. 2)…may be operable to allow users to record their exercise movements in a workout session, share their performance statistics with other users in a leaderboard, compete on the functional fitness challenges with other users, etc…may include software and/or logic for analyzing the stream of sensor data input using trained machine learning algorithms, and providing feedback and recommendation in association with the user performing the exercise movement on the interactive personal training device 108…" Also, see at least ¶ [0074, 0089]).
Regarding dependent claim 8, Sami teaches:
wherein the set of features in the second feature set are more accurate than the set of features in the first feature set (See at least Sami, ¶ [0105]; FIG. 1B; "…The gamification engine 212 cooperates with the pose estimator 302 to generate a 3D body scan for accurately visualizing the body transformations of users including body rotations over time and enables sharing of the body transformations on a social network…").

Regarding dependent claim 9, Sami teaches:
wherein the body part comprises a torso of a user (See at least Sami, ¶ [0080]; FIG. 1B; "…the machine learning engine 206 trains a Human Activity Recognition (HAR)-CNN model to identify PPG in torso, arms, and head…In another example, the machine learning engine 206 trains a Region-based CNN (R-CNN) model to infer 3D pose coordinates for keypoints, such as elbows, knees, wrists, hips, shoulder joints, etc.…").

Regarding dependent claim 10, Sami teaches:
wherein the body part comprises a limb of a user (See at least Sami, ¶ [0080]; FIG. 1B; "…the machine learning engine 206 trains a Human Activity Recognition (HAR)-CNN model to identify PPG in torso, arms, and head…In another example, the machine learning engine 206 trains a Region-based CNN (R-CNN) model to infer 3D pose coordinates for keypoints, such as elbows, knees, wrists, hips, shoulder joints, etc.…" Also, see at least ¶ [0074, 0089]).

Regarding dependent claim 11, Sami teaches:
wherein the camera comprises a webcam integrated into the user computing device and the ToF sensor comprises a low-resolution ToF sensor integrated into the user computing device (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…").

Regarding dependent claims 12 and 19, Sami teaches:
wherein the user computing device comprises a laptop computer (See at least Sami, ¶ [0038]; FIG. 1B; "…Examples of client devices 130 may include laptops…").

Regarding dependent claim 14, Sami teaches:
determining whether the posture of the user is correct or incorrect based on the second feature set (See at least Sami, ¶ [0089]; FIG. 1B; "…The movement adherence monitor 310 receives data for determining whether the user performance of one or more repetitions of the exercise movement adhere to predefined conditions or thresholds for correctly performing the exercise movement…the movement adherence monitor 310 may use a CNN model on a dataset containing repetitions of an exercise movement to determine the conditions for a proper form…the movement adherence monitor 310 compares whether the user performance of the exercise movement in view of body mechanics associated with correctly performing the exercise movement falls within acceptable range or threshold for human joint positions and movements…the movement adherence monitor 310 uses a machine learning model, such as a convolutional neural network trained on a large set of ideal or correct repetitions of an exercise movement to determine a score or a quality of the exercise movement performed by the user based at least on the estimated 3D pose data and the consecutive repetitions of the exercise movement. For example, the score (e.g., 85%) may indicate the adherence to predefined conditions for correctly performing the exercise movement…").

Regarding dependent claim 15, Sami teaches:
generating feedback data for presentation to the user, wherein the feedback data identifies whether the posture of the user is correct or incorrect (See at least Sami, ¶ [0089]; FIG. 1B; "…The movement adherence monitor 310 receives data for determining whether the user performance of one or more repetitions of the exercise movement adhere to predefined conditions or thresholds for correctly performing the exercise movement…the movement adherence monitor 310 may use a CNN model on a dataset containing repetitions of an exercise movement to determine the conditions for a proper form…the movement adherence monitor 310 compares whether the user performance of the exercise movement in view of body mechanics associated with correctly performing the exercise movement falls within acceptable range or threshold for human joint positions and movements…the movement adherence monitor 310 uses a machine learning model, such as a convolutional neural network trained on a large set of ideal or correct repetitions of an exercise movement to determine a score or a quality of the exercise movement performed by the user based at least on the estimated 3D pose data and the consecutive repetitions of the exercise movement. For example, the score (e.g., 85%) may indicate the adherence to predefined conditions for correctly performing the exercise movement…").

Regarding dependent claim 17, Sami teaches:
a central processing unit (CPU), wherein the processor is separate from the CPU, and logic implementing primary functionality of a user computing device is executed using the CPU (See at least Sami, ¶ [0053]; FIGS. 1 – 7; "…The processor 235 may be physical and/or virtual, and may include a single processing unit or a plurality of processing units and/or cores. In some implementations, the processor 235 may be capable of generating and providing electronic display signals to a display device 239, supporting the display of images, capturing and transmitting images, and performing complex tasks including various types of feature extraction and sampling…").

Regarding dependent claim 18, Sami teaches:
wherein the apparatus comprises a user computing device, and the user computing device comprises the processor, the display, the camera, the depth sensor, and the posture detection engine (See at least Sami, ¶ [0053]; FIGS. 1 – 7; "…The processor 235 may be physical and/or virtual, and may include a single processing unit or a plurality of processing units and/or cores. In some implementations, the processor 235 may be capable of generating and providing electronic display signals to a display device 239, supporting the display of images, capturing and transmitting images, and performing complex tasks including various types of feature extraction and sampling…" Also, see at least ¶ [0054 – 0057]).

Regarding dependent claim 20, Sami teaches:
wherein the camera and the depth sensor are embedded in a bezel, wherein the bezel at least partially frames the display (See at least Sami, ¶ [0053]; FIGS. 1 – 7; "…The processor 235 may be physical and/or virtual, and may include a single processing unit or a plurality of processing units and/or cores. In some implementations, the processor 235 may be capable of generating and providing electronic display signals to a display device 239, supporting the display of images, capturing and transmitting images, and performing complex tasks including various types of feature extraction and sampling…" Also, see at least ¶ [0054 – 0057]).

Regarding dependent claim 21, Sami teaches:
wherein the camera comprises a high-resolution RGB camera and the depth sensor comprises a low resolution time of flight sensor (See at least Sami, ¶ [0038, 0051, 0059]; FIG. 1B; "…client device 130 may include a camera (e.g., webcam), sensors…", "…the sensor(s) 109 may comprise one or more of a high definition (HD) camera, a regular 2D camera, a RGB camera, a time-of-flight 3D camera, or a combination of one or more of the foregoing sensors…", "…The capture device 245 may be operable to capture an image (e.g., an RGB image, a depth map), a video or data digitally of an object of interest…the capture device 245 may be a high definition (HD) camera, a regular 2D camera, a time-of-flight 3D camera…").












Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure: See the Notice of References Cited (PTO–892)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IDOWU O OSIFADE whose telephone number is (571)272-0864. The Examiner can normally be reached on Monday-Friday 8:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the Examiner’s Supervisor, ANDREW MOYER can be reached on (571) 272 – 9523. The fax phone number for the organization where this application or proceeding is assigned is (571) 273 – 8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. 
Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at (866) 217 – 9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call (800) 786 – 9199 (IN USA OR CANADA) or (571) 272 – 1000.



/IDOWU O OSIFADE/Primary Examiner, Art Unit 2675
Read full office action
Prosecution Timeline

Dec 12, 2022
Application Filed
Jan 23, 2023
Response after Non-Final Action
Jan 24, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/323,654
Patent 12604780
RADIO FREQUENCY MODULE AND COMMUNICATION DEVICE
2y 5m to grant Granted Apr 14, 2026
18/157,034
Patent 12597265
OCCLUSION RESOLVING GATED MECHANISM FOR SENSOR FUSION
2y 5m to grant Granted Apr 07, 2026
18/062,228
Patent 12592083
SYSTEMS AND METHODS FOR CONTROLLING A VEHICLE BY DETECTING AND TRACKING OBJECTS THROUGH ASSOCIATED DETECTIONS
2y 5m to grant Granted Mar 31, 2026
18/030,930
Patent 12587837
Secure Broadcast From One To Many Devices
2y 5m to grant Granted Mar 24, 2026
18/044,495
Patent 12587936
CONDITIONAL HANDOVER
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
81%
Grant Probability
94%
With Interview (+12.4%)
2y 2m
Median Time to Grant
Low
PTA Risk
Based on 671 resolved cases by this examiner. Grant probability derived from career allow rate.