Last updated: April 19, 2026
Application No. 18/429,089
Fall Detection and Prevention System for Alzheimer's, Dementia, and Diabetes

Non-Final OA §103
Filed
Jan 31, 2024
Examiner
DOTTIN, DARRYL V
Art Unit
2683
Tech Center
2600 — Communications
Assignee
Arizona Board of Regents
OA Round
1 (Non-Final)
Interview Optional

— +13.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 521 resolved cases, 2023–2026
Examiner Intelligence

DOTTIN, DARRYL V View full profile →
Grants 79% — above average
Career Allow Rate
411 granted / 521 resolved
+16.9% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Fast prosecutor
2y 1m
Avg Prosecution
20 currently pending
Career history
541
Total Applications
across all art units
Statute-Specific Performance

§101
7.4%
-32.6% vs TC avg
§103
49.5%
+9.5% vs TC avg
§102
29.1%
-10.9% vs TC avg
§112
12.7%
-27.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 521 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
2.	Claims 1-20 are pending in this application.  

Oath/Declaration
The receipt of Oath/Declaration is acknowledged.

Drawings
The receipt of Drawings is acknowledged.

Allowable Subject Matter
5.	Claims 10-13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
6.	The following is a statement of reasons for the indication of allowable subject matter: 
	
Regarding Claim 10:
None of the prior art(s) searched, cited and/or of record disclose(s) or suggest(s) the teaching(s) of the method of claim 1, wherein determining whether the captured human contour is indicative of an unstable pose comprises:
identifying a base of support of the user;
estimating a center of mass of the user;
identifying a gravity midline extending perpendicular to the gravitational field from the estimated center of mass of the user; and
determining whether the gravity midline is within the base of support of the user.

Regarding Claim 11:
None of the prior art(s) searched, cited and/or of record disclose(s) or suggest(s) the teaching(s) of the method of the claim 10, wherein estimating the center of mass of the user comprises:
storing health information of the user;
estimating, based on the health information of the user, the density of one or more body parts of the user; and
estimating the center of mass of the user based on the captured human contour and the estimated density of each of the one or more body parts of the user.

	
Regarding Claim 12:
None of the prior art(s) searched, cited and/or of record disclose(s) or suggest(s) the teaching(s) of the method of claim 11, wherein the health information includes height and weight and the density of the one or more body parts of the user are estimated based on the height and weight of the user.

Regarding Claim 13:
None of the prior art(s) searched, cited and/or of record disclose(s) or suggest(s) the teaching(s) of the method of claim 11, wherein estimating the center of mass of the user comprises:
assigning geometric shapes to a wireframe indicative of the pose of the user;
estimating the density of each geometric shape based on the health information of the user; and
estimating the center of mass of the geometric shapes indicative of the pose of the user.

Claim Rejections - 35 USC § 103
7.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
8.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

9.	The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
10.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

11.	Claims 1-9 and 14-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sung (US PG. Pub. 2022/0366653 A1) in view of Kim (US PG. Pub. 2021/0067684 A1).


Referring to Claim 1, Sung teaches a fall prevention method (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), comprising:	
receiving (See Sung, Fig. 4, Step 410, Sect. [0080] lines 1-3, At step 410, receiving full-body image 405 of the user in his physical environment, wherein the image is captured by a RGB camera.), via a local area network by a local controller (See Sun, Sect. [0172], the hardware may include an interface to one or more networks (e.g., a local area network (LAN) or other Internet networks to permit communication of information with computing device 220 coupled to the networks) in an environment of a user (See Sung, Fig. 3B, Sect. [0077] lines 12-15, As the user moves through space in his own physical environment (e.g., stand, jump, squat), his 3D virtual representation 340 moves in the virtual environment accordingly.), video images of the user from each of a plurality of image capture systems in the environment of the user (See Sung, Sect. [0007] and [0066], generating images video models of users in a virtual environment using a computing device 220 having one or more cameras for video capture; the RGB or RGB-D camera as comprising plural number of lenses or individual cameras that collectively produce a single photo via computational photography. For example, the latest IPHONE models comprise a three-lens camera system having a telephoto lens, a wide lens, and an ultra-wide lens, as well as Light Detection and Ranger (LiDAR) sensor for creating a depth map of the surroundings.);
using the one or more pre-trained machine learning models, by the local controller (See Sung, Fig. 5, Step 520, Sect. [0084] lines 1-4, At step 520, a body bounding box associated with the user may be detected or determined from the RGB image, using a trained machine learning technique, such as a trained neural network.), to capture at least one human contour indicative of the pose of the user based on the video images received from each of the plurality of image capture systems (See Sung, Fig. 4, Step 410, Sect. [0080] lines 1-5, At step 410, the system as disclosed herein receives or retrieves a full-body image 405 of the user in his physical environment, wherein the image is captured by a single RGB camera having plural lenses with three lens camera systems.);
determining, for each captured human contour, whether the captured human contour is indicative of an unstable pose (See Sung, Fig. 9, Sect. [0103] lines 21-31, a trained neural network may process RGB data to determine user posture and pose information, and a 3D virtual representation may be generated from a user posture, by using the user posture as a skeleton to generate 3D body avatars. Exemplary pose estimation modules based on convolution neural networks are discussed with reference to FIGS. 15 to 19B. At step 940, the disclosed systems may position the generated 3D model of the user inside the virtual environment. Finally, at step 950, the disclosed systems may generate an output for display, or display the virtual environment with the 3D model of the user.); and
outputting audible or haptic feedback to the user in response to a determination that a captured human contour is indicative of an unstable pose (See Sung, Sect. [0131], audio generated by a user computing device and/or audio generated by one or more users may be used to facilitate an interactive full body VR session and direct users to particular positions or pose on area with further audio feedback to help the users locate themselves more accurately and provide feedback to the user to inform them if the users are making a wrong move or motion or action that a user needs to do as part of a VR application and may also be used to facilitate the session by allowing users to set options, correct mistakes, or start or stop the session.).

Sung fails to explicitly teach
storing, by the local controller, one or more pre-trained machine learning models for estimating a pose of the user.

However, Kim teaches 
storing, by the local controller, one or more pre-trained machine learning models for estimating a pose of the user (See Kim, Fig. 2, Memory 170, Sect. [0265], stored in the memory 170 of the user equipment 100 is artificial intelligent models 173 which includes trained image data to recognize the bilaterally symmetrical positions of the crown of the head and the jaw among human body parts, and the joints located in a bilaterally symmetrical position of the human body by using machine learning and may be completed by undergoing a learning process and an evaluation process in the server 200, which is the learning device 200.).

Before the effective filing date of the claimed invention, it would have obvious to a person of ordinary skill in the art to incorporate storing, by the local controller, one or more pre-trained machine learning models for estimating a pose of the user.  The motivation for doing so would have been to to provide equipment and a method for utilizing human recognition in which a highlight part of a moving image stored in the equipment may be consecutively played with respect to each person appearing in the moving image through a search for a moving image, or an edited moving image may be generated (See Sect. [0011] of the Kim reference).  Therefore, it would have been obvious to combine Sung and Kim to obtain the invention as specified in claim 1.

	Referring to Claim 2, the combination of Sung in view of Kim teaches the method of claim 1 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein capturing at least one human contour based on the video images received from each of the plurality of image capture systems comprises capturing, for each of the plurality of image capture systems, a two-dimensional human contour indicative of the pose of the user from the point-of-view of the image capture system (See Sung, Sect. [0015] lines 4-9, receiving an image of the user captured using an RGB camera; detecting a body bounding box associated with the user from the image using a first trained neural network; determining a segmentation map of the user, based on the body bounding box; determining a two-dimensional (2D) contour of the user from the segmentation map).

	Referring to Claim 3, the combination of Sung in view of Kim teaches the method of claim 2 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), further comprising:
receiving depth information from each image capture system (See Sung, Fig. 3B, Body Center Depth Pixels, Sect. [0077] lines 5-7, diagrams 358, 368, and 378 of Fig. 3B, corresponds to captured body center depth pixels, body bounding boxes and user contour used in 3D model generation.); and
identifying the depth of each pixel of each captured two-dimensional human contour (See Sung, Fig. 3A, Sect. [0076], a body center depth pixel 326 and a body bounding box 324 of the user 210's two-dimension (2D) contour 322 is extracted after a user segmentation process.).

	Referring to Claim 4, the combination of Sung in view of Kim teaches the method of claim 3 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein each image capture system comprises a depth camera or light detection and ranging (LiDAR) scanner (See Sung, Sect. [0066] lines 15-22, the system that may comprise any number of lenses or individual cameras that collectively produce a single photo via computational photography or latest IPHONE models comprising three-lens camera system having a telephoto lens, a wide lens, and an ultra-wide lens, as well as Light Detection and Ranger (LiDAR) sensor for creating a depth map of the surroundings.).

	Referring to Claim 5, the combination of Sung in view of Kim teaches the method of claim 2 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein audible or haptic feedback is output in response to a determination that any two-dimensional human contour from the point-of-view of any of the image capture systems is indicative of an unstable pose (See Sung, Sect. [0131] lines 9-18, (iii) provide feedback to the user (e.g., to inform them if the users are making a wrong move, running out of time, have successfully completed a given movement, or achieved a particular score), or (iv) report on the progress of the session (statistics, leaderboard, and the like) to facilitate the session by allowing users to set options, correct mistakes, or start or stop the session.).

	Referring to Claim 6, the combination of Sung in view of Kim teaches the method of claim 1 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein capturing at least one human contour based on the video images received from each of the plurality of image capture systems comprises reconstructing a three-dimensional human contour indicative of the three-dimensional pose of the user based on the video images received from the plurality of image capture systems (See Sung, Sect. [0025], constructing a three-dimensional (3D) model of a user in a virtual environment, comprising a processor and a non-transitory physical storage medium for storing program code accessible by the processor. The program code when executed by the processor causes the processor to: receive an image of the user captured using an RGB camera; detect a body bounding box associated with the user from the image using a first trained neural network and form a 3D extrusion model of the user by extruding the 2D contour; and construct the 3D model of the user in the virtual environment by applying a geometric transformation to the 3D extrusion model to position the 3D model of the user at a target location and at a target scale factor in the virtual environment.).

	Referring to Claim 7, the combination of Sung in view of Kim teaches the method of claim 1 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein capturing the at least one human contour using the at least one or more pre-trained machine learning models comprises (See Sung, Sect. [0032] lines 7-13, capturing an image of the user body from RGB camera using a first trained neural network based on two-dimensional (2D) contour of the user from the segmentation map and form a 3D extrusion model of the user by extruding the 2D contour.):
using a pre-trained pose detection model to infer landmarks indicative of joints of the user (See Sung, Sect. [0084] lines 11-18, by a trained machine learning algorithm, a body bounding box is a bounding box that outlines the user's full body, or outlines one or more body parts (e.g., upper torso, upper body, lower body as shown with the bounding box 422 in FIG. 4 is a full body bounding box that encloses the user's entire body and body bounding box 324 in FIG. 3A encloses the user's upper torso  and upper body only.); and
using a pre-trained image segmentation model to infer a segmentation mask indicative of the pose of the user (See Sung, Sect. [0086] a segmentation map of the player's body may be estimated with a trained segmentation neural network from the RGB data inside the full body bounding box, the body bounding box may enclose the user's full body pose.).

	Referring to Claim 8, the combination of Sung in view of Kim teaches the method of claim 1 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein capturing the at least one human contour using the one or more pre-trained machine learning models comprises (See Sung, Sect. [0032] lines 7-13, capturing an image of the user body from RGB camera using a first trained neural network based on two-dimensional (2D) contour of the user from the segmentation map and form a 3D extrusion model of the user by extruding the 2D contour.):
training a background subtraction model to identify image data depicting the environment (See Sung, Fig. 4, Step 450, Sect. [0080] lines 10-13, At step 450, a geometric transformation is performed to project the camera representation 442 to the VR environment representation 340, with appropriate scales, perspectives, offsets, and/or other rendering parameters.);
using a pre-trained body identification model to identify a bounding box surrounding image data depicting the user (See Sung, Sect. [0084] lines 1-5, At step 520, a body bounding box associated with the user may be detected or determined from the RGB image, using a trained machine learning technique, such as a trained neural network. Bounding boxes are commonly used in computer vision and machine learning.); and 
using the trained background subtraction model to subtract image data depicting the environment from the image data within the bounding box (See Sung, Sect. [0080] lines 5-7, At step 420, user segmentation may be performed on RGB data or depth data to extract the user from the image background).

	Referring to Claim 9, the combination of Sung in view of Kim teaches method of claim 7 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein the bounding box has a height and a width and the determination of whether the captured human contour is indicative of an unstable pose is based on a comparison of the height and the width of the bounding box (See Sung, Sect. [0094] lines 7-10, a second partial body bounding box 614 focuses on the user's upper torso or upper body and may configure the number and size of body bounding boxes.).

	Referring to Claim 14, the combination of Sung in view of Kim teaches the method of claim 1 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein determining whether the captured human contour is indicative of an unstable pose comprises (See Sung, Sect. [0103] lines 21-25 determine user posture and pose information using trained neural network process by captured RGB data and 3D virtual representation generated from a user posture as a skeleton to generate 3D body avatars):
identifying a base of support of the user (See Sung, Sect. [0075], user 210 is standing in front of a window in an indoor space. Embodiments of the present invention may be used in indoor or outdoor settings, under varying lighting conditions. Embodiments of the present invention may also be capable of supporting flexible placement of the mobile device (e.g., on the floor, on a table, on a tripod, on the wall), and are resilient to vibration or accidental movements.);
identifying a center of area of the captured human contour (See Sung, Fig. 6, Step 630, Sect. [0096] lines 1-4, At step 630, a single, optimal body center pixel may be determined, representing a body center or centroid for the user, as defined by some optimization conditions);
identifying a geometric midline extending from the center of the base of support of the user through the center of arca of the captured human contour (See Sung, Sect. [0097], After the body center pixel is found, the depth data may be segmented, based on the body center depth pixel and an offset. For instance, in the illustrative example shown in FIG. 6, at step 640, a thresholding technique may be used to convert the depth data into a binary map indicating whether each pixel is closer to the camera than the optimal body center depth pixel plus the offset. For example, with a body center depth pixel depth of 1, and a chosen offset of 1, the depth map may be converted into binary map 642, with or without appropriate cropping of the depth map based on the body bounding boxes.); and
determining whether the captured human contour is indicative of an unstable pose based on an angle of the geometric midline (See Sung, Fig. 18A, Sect. [0152], FIG. 18A is a block diagram 1800 of an exemplary neural network for pose estimation with the neural network layers or blocks are drawn with thickened lines. A two-branch CNN efficiently detects poses of multiple people in an input image by predicting part confidence maps for body parts, and part affinity fields for body part-to-body part association, effectively decoupling the detection of a body part such as an arm or leg, and the assignment of the detected body part to an individual person. A part affinity field (PAF) is a 2D vector field that encodes the location and orientation of body parts including limbs over the image domain. A PAF encodes the association between body parts, where body parts belonging to the same person are linked.).

	Referring to Claim 15, the combination of Sung in view of Kim teaches the method of claim 1 (See Sung, Figs. 4-9, Methods 400, 500, 600 and 900, Full Body VR Capture and 3D Model Representation), wherein determining whether the captured human contour is indicative of an unstable pose comprises (See Sung, Sect. [0103] lines 21-25 determine user posture and pose information using trained neural network process by captured RGB data and 3D virtual representation generated from a user posture as a skeleton to generate 3D body avatars):
identifying a center of area of the captured human contour (See Sung, Fig. 6, Step 630, Sect. [0090] lines 1-4, At step 630, a single, optimal body center pixel may be determined, representing a body center or centroid for the user, as defined by some optimization conditions.);
estimating a center of mass of the user (See Sung, Sect. [0096] 4-7, a body center depth pixel from depth map 616 may be found by minimizing both a difference to the estimated user depth and a distance from a center of a chosen body bounding box.); and
determining whether the captured human contour is indicative of an unstable pose based on a distance between the center of area of the captured human contour and the estimated center of mass of the user (See Sung, Sect. [0096] lines 7-16,  if (x, y) is the coordinate of the body center depth pixel to be determined, D(x, y) is the depth at (x, y), (b.sub.x, b.sub.y) is the coordinate of the center of the body bounding box, and d is the previously estimated user depth, a cost function C(x, y)=(x−b.sub.x).sup.2+(y−b.sub.y).sup.2+(D(x, y)−d).sup.2 may be minimized to find an optimal body center depth pixel location. In the example shown in FIG. 6, the body center depth pixel coincides with the geometric center of the smaller body bounding box, and has a depth of 1.).

	Referring to Claim 16, Sung teaches a fall prevention system (See Sung, Fig. 2, Full Body VR System 200), comprising:
a plurality of image capture systems in an environment of a user (See Sung, Sect. [0007] and [0066], generating images video models of users in a virtual environment using a computing device 220 having one or more cameras for video capture; the RGB or RGB-D camera as comprising plural number of lenses or individual cameras that collectively produce a single photo via computational photography. For example, the latest IPHONE models comprise a three-lens camera system having a telephoto lens, a wide lens, and an ultra-wide lens, as well as Light Detection and Ranger (LiDAR) sensor for creating a depth map of the surroundings.);
a local controller (See Fig. 2, Computing Device 220), in communication with the plurality of image capture systems (See Sung, Fig. 2 RGB Cameras on Computing Device 220, Sect. [0066] and [0074] lines 4-7, the RGB/RGB-D camera with plural lenses and individual cameras comprising a three-lens camera system having a telephoto lens, a wide lens, and an ultra-wide lens is in communication with computing device 220) via a local area network (See Sung, Sect. [0172], the hardware may include an interface to one or more networks (e.g., a local area network (LAN) or other Internet networks to permit communication of information with computing device 220 coupled to the networks),
receives video images of the user from each of the plurality of image capture systems (See Sung, Fig. 4, Step 410, Sect. [0080] lines 1-5, At step 410, the system as disclosed herein receives or retrieves a full-body image 405 of the user in his physical environment, wherein the image is captured by a single RGB camera. In some embodiments, the RGB camera may be an RGB-D camera.);
uses the one or more pre-trained machine learning models (See Sung, Fig. 5, Step 520, Sect. [0084] lines 1-4, At step 520, a body bounding box associated with the user may be detected or determined from the RGB image, using a trained machine learning technique, such as a trained neural network.) to capture at least one human contour indicative of the pose of the user based on the video images received from each of the plurality of image capture systems (See Sung, Fig. 4, Step 410, Sect. [0080] lines 1-5, At step 410, the system as disclosed herein receives or retrieves a full-body image 405 of the user in his physical environment, wherein the image is captured by an RGB camera with three lens camera systems.); and
determines, for each captured human contour, whether the captured human contour is indicative of an unstable pose (See Sung, Fig. 9, Sect. [0103] lines 21-31, a trained neural network may process RGB data to determine user posture and pose information, and a 3D virtual representation may be generated from a user posture, by using the user posture as a skeleton to generate 3D body avatars. Exemplary pose estimation modules based on convolution neural networks are discussed with reference to FIGS. 15 to 19B. At step 940, the disclosed systems may position the generated 3D model of the user inside the virtual environment. Finally, at step 950, the disclosed systems may generate an output for display, or display the virtual environment with the 3D model of the user.);
a feedback device that outputs audible or haptic feedback to the user in response to a determination that a captured human contour is indicative of an unstable pose (See Sung, Sect. [0131], audio generated by a user computing device and/or audio generated by one or more users may be used to facilitate an interactive full body VR session and direct users to particular positions or pose on area with further audio feedback to help the users locate themselves more accurately and provide feedback to the user to inform them if the users are making a wrong move or motion or action that a user needs to do as part of a VR application and may also be used to facilitate the session by allowing users to set options, correct mistakes, or start or stop the session.).

Sung fails to explicitly teach
that:
stores one or more pre-trained machine learning models for estimating a pose of the user.

However, Kim teaches 
that:
stores one or more pre-trained machine learning models for estimating a pose of the user (See Kim, Fig. 2, Memory 170, Sect. [0265], stored in the memory 170 of the user equipment 100 is artificial intelligent models 173 which includes trained image data to recognize the bilaterally symmetrical positions of the crown of the head and the jaw among human body parts, and the joints located in a bilaterally symmetrical position of the human body by using machine learning and may be completed by undergoing a learning process and an evaluation process in the server 200, which is the learning device 200.).

Before the effective filing date of the claimed invention, it would have obvious to a person of ordinary skill in the art to incorporate that: stores one or more pre-trained machine learning models for estimating a pose of the user. The motivation for doing so would have been to provide equipment and a method for utilizing human recognition in which a highlight part of a moving image stored in the equipment may be consecutively played with respect to each person appearing in the moving image through a search for a moving image, or an edited moving image may be generated (See Sect. [0011] of the Kim reference).  Therefore, it would have been obvious to combine Sung and Kim to obtain the invention as specified in claim 16.

	Referring to Claim 17, the combination of Sung in view of Kim teaches the system of claim 16 (See Sung, Fig. 2, Full Body VR System 200), wherein, for each of the plurality of image capture systems (See Sung, Fig. 2 RGB Cameras on Computing Device 220, Sect. [0066] and [0074] lines 4-7, the RGB/RGB-D camera with plural lenses and individual cameras comprising a three-lens camera system having a telephoto lens, a wide lens, and an ultra-wide lens is in communication with computing device 220), the local controller captures a two-dimensional human contour indicative of the pose of the user from the point-of-view of the image capture system (See Sung, Sect. [0015] lines 4-9, receiving an image of the user captured using an RGB camera; detecting a body bounding box associated with the user from the image using a first trained neural network; determining a segmentation map of the user, based on the body bounding box; determining a two-dimensional (2D) contour of the user from the segmentation map).

	Referring to Claim 18, the combination of Sung in view of Kim teaches the system of claim 17 (See Sung, Fig. 2, Full Body VR System 200), wherein the feedback device outputs feedback in response to a determination that any two-dimensional human contour from the point-of-view of any of the image capture systems is indicative of an unstable pose (See Sung, Sect. [0131] lines 9-18, (iii) provide feedback to the user (e.g., to inform them if the users are making a wrong move, running out of time, have successfully completed a given movement, or achieved a particular score), or (iv) report on the progress of the session (statistics, leaderboard, and the like) to facilitate the session by allowing users to set options, correct mistakes, or start or stop the session.).

	Referring to Claim 19, the combination of Sung in view of Kim teaches the system of claim 16 (See Sung, Fig. 2, Full Body VR System 200), wherein the local controller captures the at least one human contour by (See Sung, Fig. 2, Sect. [0074], an RGB or RGB-D camera on computing device 220 captures arm, leg, upper body, or lower or lower body movements of user 220 for use in constructing a 3D body model for a VR environment, any image of user 210 as captured by mobile computing device 220 comprises at least one of the user's upper body (e.g., head, neck, shoulders, upper torso, waist, upper arms, elbows, lower arms, and/or hands) and the user's lower body (e.g., waist, hips, upper legs, knees, lower legs, ankles, and/or feet).):
using a pre-trained pose detection model to infer landmarks indicative of joints of the user (See Sung, Sect. [0084] lines 11-18, by a trained machine learning algorithm, a body bounding box is a bounding box that outlines the user's full body, or outlines one or more body parts (e.g., upper torso, upper body, lower body as shown with the bounding box 422 in FIG. 4 is a full body bounding box that encloses the user's entire body and body bounding box 324 in FIG. 3A encloses the user's upper torso  and upper body only.); and
using a pre-trained image segmentation model to infer a segmentation mask indicative of the pose of the user (See Sung, Sect. [0086] a segmentation map of the player's body may be estimated with a trained segmentation neural network from the RGB data inside the full body bounding box, the body bounding box may enclose the user's full body pose.).

	Referring to Claim 20, the combination of Sung in view of Kim teaches the system of claim 16 (See Sung, Fig. 2, Full Body VR System 200), wherein the local controller captures the at least one human contour by (See Sung, Fig. 2, Sect. [0074], an RGB or RGB-D camera on computing device 220 captures arm, leg, upper body, or lower or lower body movements of user 220 for use in constructing a 3D body model for a VR environment, any image of user 210 as captured by mobile computing device 220 comprises at least one of the user's upper body (e.g., head, neck, shoulders, upper torso, waist, upper arms, elbows, lower arms, and/or hands) and the user's lower body (e.g., waist, hips, upper legs, knees, lower legs, ankles, and/or feet).):
using a pre-trained body identification model to identify a bounding box surrounding image data depicting the user (See Sung, Fig. 5, Step 520, Sect. [0084] lines 1-12, At step 520, a body bounding box associated with the user may be detected or determined from the RGB image, using a trained machine learning technique, such as a trained neural network used in computer vision and machine learning. They are rectangular-shaped boxes that localize or define the spatial location of an object within an image. A bounding box outlines a detected target item in a box with border coordinates, and may be determined manually (e.g., by a human annotator during training data generation for a machine learning system) or automatically (e.g., by a trained machine learning algorithm).);
using a background subtraction model that has been trained to identify image data depicting the environment to subtract image data depicting the environment from the image data within the bounding box (See Sung, Fig. 4, Steps 410 and 420, Sect. [0080] lines 1-7, At step 410, the system as disclosed herein receives or retrieves a full-body image 405 of the user in his physical environment, wherein the image is captured by a single RGB camera. In some embodiments, the RGB camera may be an RGB-D camera. At step 420, user segmentation may be performed on RGB data or depth data to extract the user from the image background.).

Cited Art
12.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure Wanget al. (US PG. PUB. No. 2023/0015717 A1) discloses a method for visualizing and targeting anatomical structures inside a patient utilizing a handheld screen device may include grasping the handheld screen device and manipulating a position of the handheld screen device relative to the patient. The handheld screen device may include a camera and a display. The method may also include orienting the camera on the handheld screen device relative to an anatomical feature of the patient by manipulating the position of the handheld screen device relative to the patient, capturing first image data of light reflecting from a surface of the anatomical feature with the camera on the handheld screen device, and comparing the first image data with a pre-operative 3-D image of the patient to determine a location of an anatomical structure located inside the patient and positioned relative to the anatomical feature of the patient.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DARRYL V DOTTIN whose telephone number is (571)270-5471. The examiner can normally be reached M-F 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached on 571-270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DARRYL V DOTTIN/Primary Examiner, Art Unit 2683






/DARRYL V DOTTIN/Primary Examiner, Art Unit 2683
Read full office action
Prosecution Timeline

Jan 31, 2024
Application Filed
Dec 30, 2025
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/075,555
Patent 12602618
ARTIFICIAL VISION PARAMETER LEARNING AND AUTOMATING METHOD FOR IMPROVING VISUAL PROSTHETIC SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/246,492
Patent 12602425
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM
2y 5m to grant Granted Apr 14, 2026
17/924,382
Patent 12586181
FUNCTIONAL IMAGING FEATURES FROM COMPUTED TOMOGRAPHY IMAGES
2y 5m to grant Granted Mar 24, 2026
18/471,255
Patent 12586150
EFFICIENT BI-DIRECTIONAL IMAGE SCALING
2y 5m to grant Granted Mar 24, 2026
18/516,308
Patent 12585416
IMAGE PROCESSING APPARATUS, CONTROL METHOD OF IMAGE PROCESSING APPARATUS, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
79%
Grant Probability
92%
With Interview (+13.3%)
2y 1m
Median Time to Grant
Low
PTA Risk
Based on 521 resolved cases by this examiner. Grant probability derived from career allow rate.
Fall Detection and Prevention System for Alzheimer's, Dementia, and Diabetes

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email