Last updated: April 19, 2026
Application No. 19/098,069
VIDEO STREAMING METHOD AND DEVICE OF USER-CONTEXT-INFORMATION-PREDICTION-BASED EXTENDED REALITY DEVICE

Non-Final OA §103
Filed
Apr 02, 2025
Examiner
ILUYOMADE, IFEDAYO B
Art Unit
2624
Tech Center
2600 — Communications
Assignee
Korea Electronics Technology Institute
OA Round
1 (Non-Final)
Interview Optional

— +9.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 630 resolved cases, 2023–2026
Examiner Intelligence

ILUYOMADE, IFEDAYO B View full profile →
Grants 74% — above average
Career Allow Rate
464 granted / 630 resolved
+11.7% vs TC avg
Moderate +9% lift
Without
With
+9.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
27 currently pending
Career history
657
Total Applications
across all art units
Statute-Specific Performance

§101
1.4%
-38.6% vs TC avg
§103
56.8%
+16.8% vs TC avg
§102
29.7%
-10.3% vs TC avg
§112
6.1%
-33.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 630 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wan et al (US Patent 11823498) in view of Fu et al (US Pub. 20220254157).
Regarding claim 1, Wan discloses:
A video streaming method of an extended reality device, (at least refer to fig. 1-2 and column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission. Column 13, line 27-32, describes: for example, a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivative thereof. Artificial-reality content may include completely computer-generated content or computer-generated content combined with captured (e.g., real-world) content) the video streaming method comprising: 
Receiving situation information including user location information and pose information at a current point in time from the extended reality device, (at least refer to fig. 1-2 and column 2, line 55-65. Describes System 101 may include an intake module 104 that is configured to access a frame of a video stream (e.g., a current frame for which to determine hand pose information) that depicts a hand. By way of example, intake module 104 may receive a frame 152 of a video stream 150. As pictured, frame 152 may be the latest frame of a video stream being received (and, in some examples, generated) in real-time. Column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); 
Predicting changes in user location information and pose information at a preset next point in time using pre-learned artificial intelligence receiving, as input, situation information at the current point in time and situation information at a preset previous point in time, (at least refer to fig. 2-3 and column 2-3, line 65-3. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 152 may represent the current frame being analyzed, even if frame 152 is not the latest available frame in video stream 150. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); 
Rendering an image texture of a video based on the predicted changes in user location information and pose information at the next point in time, (at least refer to fig. 2-3 and column 9, line 1-9. Describes the systems described herein (such as system 300) may iteratively estimate a hand pose for each frame of a video stream (e.g., in real time, as each frame is generated and/or made available).  However, gesture predictor 330 may estimate hand poses much more efficiently (when, e.g., rigidity can be assumed). Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); and 
Transmitting image data with the image texture rendered at the next point in time to the extended reality device, (at least refer to fig. 2-3, 8 and column 17, line 49-52. Describes the display devices may refract the projected light toward a user's pupil and may enable a user to simultaneously view both artificial-reality content and the real world. Column 19, line 35-39, describes: For example, one or more of the modules recited herein may receive a video stream to be transformed, transform the video stream into hand pose estimation data, output a result of the transformation to control a user interface) 
Wherein the situation information at the current point in time and the situation information at the present previous point in time configure consecutive frames, (at least refer to fig. 2-3, 8 and column 3, line 25-30. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 154 may represent the frame previous to the current frame being analyzed (and, e.g., the frame that was most recently analyzed for hand pose information). Column 4, line 46-50, describes: For example, systems described herein may receive a video stream and extract the present frame from the video stream (e.g., by extracting the latest frame in the video stream and/or by extracting a current frame for analysis in the video stream)).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
The two references are analogous art because they both relate with the same field of invention of computer-generated content system.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate location coordinate in an image as taught by Fu with the pose detection generation system as disclosed by Wan. The motivation to combine the Fu reference is to establish a robust and stable multi-person pose estimation which can be deployed on many applications that require human pose input.
Regarding claim 6, Wan discloses:
A video streaming apparatus of an extended reality device, (at least refer to fig. 1-2 and column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission. Column 13, line 27-32, describes: for example, a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivative thereof. Artificial-reality content may include completely computer-generated content or computer-generated content combined with captured (e.g., real-world) content) the video streaming apparatus comprising: 
A reception unit configured to receive situation information including user location information and pose information at a current point in time from the extended reality device, (at least refer to fig. 1-2 and column 2, line 55-65. Describes System 101 may include an intake module 104 that is configured to access a frame of a video stream (e.g., a current frame for which to determine hand pose information) that depicts a hand. By way of example, intake module 104 may receive a frame 152 of a video stream 150. As pictured, frame 152 may be the latest frame of a video stream being received (and, in some examples, generated) in real-time. Column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); 
A prediction unit configured to predict changes in user location information and pose information at a preset next point in time using pre-learned artificial intelligence receiving, as input, situation information at the current point in time and situation information at a preset previous point in time, (at least refer to fig. 2-3 and column 2-3, line 65-3. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 152 may represent the current frame being analyzed, even if frame 152 is not the latest available frame in video stream 150. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); 
A rendering unit configured to render an image texture of a video based on the predicted changes in user location information and pose information at the next point in time, (at least refer to fig. 2-3 and column 9, line 1-9. Describes the systems described herein (such as system 300) may iteratively estimate a hand pose for each frame of a video stream (e.g., in real time, as each frame is generated and/or made available).  However, gesture predictor 330 may estimate hand poses much more efficiently (when, e.g., rigidity can be assumed). Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); and 
A transmission unit configured to transmit image data with the image texture rendered at the next point in time to the extended reality device, (at least refer to fig. 2-3, 8 and column 17, line 49-52. Describes the display devices may refract the projected light toward a user's pupil and may enable a user to simultaneously view both artificial-reality content and the real world. Column 19, line 35-39, describes: For example, one or more of the modules recited herein may receive a video stream to be transformed, transform the video stream into hand pose estimation data, output a result of the transformation to control a user interface) 
Wherein the situation information at the current point in time and the situation information at the present previous point in time configure consecutive frames, (at least refer to fig. 2-3, 8 and column 3, line 25-30. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 154 may represent the frame previous to the current frame being analyzed (and, e.g., the frame that was most recently analyzed for hand pose information). Column 4, line 46-50, describes: For example, systems described herein may receive a video stream and extract the present frame from the video stream (e.g., by extracting the latest frame in the video stream and/or by extracting a current frame for analysis in the video stream)).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
Regarding the rejection of claim 6, refer to the motivation of claim 1.
Regarding claim 10, Wan discloses:
A video streaming system, (at least refer to fig. 1-2 and column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission) comprising: 
A rendering server, and an extended reality device, (at least refer to fig. 1-2 and column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission. Column 13, line 27-32, describes: for example, a virtual reality, an augmented reality, a mixed reality, a hybrid reality, or some combination and/or derivative thereof. Artificial-reality content may include completely computer-generated content or computer-generated content combined with captured (e.g., real-world) content)
Wherein the extended reality device acquires situation information including user location information and pose information of the extended reality device at a current point in time to the rendering server, (at least refer to fig. 1-2 and column 2, line 55-65. Describes System 101 may include an intake module 104 that is configured to access a frame of a video stream (e.g., a current frame for which to determine hand pose information) that depicts a hand. By way of example, intake module 104 may receive a frame 152 of a video stream 150. As pictured, frame 152 may be the latest frame of a video stream being received (and, in some examples, generated) in real-time. Column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions) 
Wherein the rendering server: receives situation information at the current point in time from the extended reality device, (at least refer to fig. 1-2 and column 2, line 55-65. Describes System 101 may include an intake module 104 that is configured to access a frame of a video stream (e.g., a current frame for which to determine hand pose information) that depicts a hand. By way of example, intake module 104 may receive a frame 152 of a video stream 150. As pictured, frame 152 may be the latest frame of a video stream being received (and, in some examples, generated) in real-time. Column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission); 
Predicts changes in user location information and pose information at a preset next point in time using pre-learned artificial intelligence receiving, as input, situation information at the current point in time and situation information at a preset previous point in time, (at least refer to fig. 2-3 and column 2-3, line 65-3. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 152 may represent the current frame being analyzed, even if frame 152 is not the latest available frame in video stream 150. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); 
Rendering an image texture of a video based on the predicted changes in user location information and pose information at the next point in time, (at least refer to fig. 2-3 and column 9, line 1-9. Describes the systems described herein (such as system 300) may iteratively estimate a hand pose for each frame of a video stream (e.g., in real time, as each frame is generated and/or made available).  However, gesture predictor 330 may estimate hand poses much more efficiently (when, e.g., rigidity can be assumed). Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions); and 
Transmitting image data with the image texture rendered at the next point in time to the extended reality device, (at least refer to fig. 2-3, 8 and column 17, line 49-52. Describes the display devices may refract the projected light toward a user's pupil and may enable a user to simultaneously view both artificial-reality content and the real world. Column 19, line 35-39, describes: For example, one or more of the modules recited herein may receive a video stream to be transformed, transform the video stream into hand pose estimation data, output a result of the transformation to control a user interface)
Wherein the situation information at the current point in time and the situation information at the present previous point in time configure consecutive frames, (at least refer to fig. 2-3, 8 and column 3, line 25-30. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 154 may represent the frame previous to the current frame being analyzed (and, e.g., the frame that was most recently analyzed for hand pose information). Column 4, line 46-50, describes: For example, systems described herein may receive a video stream and extract the present frame from the video stream (e.g., by extracting the latest frame in the video stream and/or by extracting a current frame for analysis in the video stream)).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
Regarding the rejection of claim 10, refer to the motivation of claim 1.
Regarding claim 2, Wan discloses:
Wherein the receiving comprises receiving inertial measurement unit (IMU) information from the extended reality device as the pose information, (at least refer to fig. 2-3, 9 and column 16, line 44-47. Describes augmented-reality system 900 includes an inertial measurement unit, controller 925 may compute all inertial and spatial calculations from the IMU located on eyewear device 902).
Regarding claim 3, Wan discloses:
Wherein the receiving comprises receiving an image captured by a camera of the extended reality device and acquiring the pose information through analysis of the received image, (at least refer to fig. 1-2 and column 2, line 55-65. Describes System 101 may include an intake module 104 that is configured to access a frame of a video stream (e.g., a current frame for which to determine hand pose information) that depicts a hand. By way of example, intake module 104 may receive a frame 152 of a video stream 150. As pictured, frame 152 may be the latest frame of a video stream being received (and, in some examples, generated) in real-time. Column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission).
Regarding claim 4, Wan discloses:
Wherein the predicting comprises predicting changes in user location information and pose information at the next point in time based on a difference in the user location information and pose information between the current point in time and the previous point in time in the artificial intelligence, (at least refer to fig. 2-3 and column 2-3, line 65-3. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 152 may represent the current frame being analyzed, even if frame 152 is not the latest available frame in video stream 150. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions. Column 5-6, line 63-4, describes: Accordingly, the previous depiction of the multi-segment articulated body system provided in the previous frame may represent a relatively small difference from the present depiction of the multi-segment articulated body system in the present frame. As will be explained in greater detail below, systems described herein may classify this difference to determine whether the difference nevertheless demonstrates rigidity across the previous and present frames).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
Regarding the rejection of claim 4, refer to the motivation of claim 1.
Regarding claim 5, Wan discloses:
Wherein the artificial intelligence predicts changes in user location information and pose information at the current point in time based on 6-degree-of-freedom information for each consecutive frame included in the situation information, (at least refer to fig. 2-3 and column 8, line 27-35. Describes such a machine learning model may be trained to estimate the hand pose based on a training corpus of hand images that demonstrate hands in various possible configurations and that collectively demonstrates the freedom of movement provided by the various joints of the human hand. The machine learning model may also be complex enough (including, e.g., involving enough computations) to reliably and accurately estimate hand poses from images. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
Regarding the rejection of claim 5, refer to the motivation of claim 1.
Regarding claim 7, Wan discloses:
Wherein the reception unit receives an image captured by a camera of the extended reality device and acquires the pose information through analysis of the received image, (at least refer to fig. 1-2 and column 2, line 55-65. Describes System 101 may include an intake module 104 that is configured to access a frame of a video stream (e.g., a current frame for which to determine hand pose information) that depicts a hand. By way of example, intake module 104 may receive a frame 152 of a video stream 150. As pictured, frame 152 may be the latest frame of a video stream being received (and, in some examples, generated) in real-time. Column 4, line 3-6. Describes the video stream be processed by the systems and methods described herein in real-time—e.g., as it is captured (e.g., by a camera used in an augmented reality system) and/or received via a transmission).
Regarding claim 8, Wan discloses:
Wherein the prediction unit predicts changes in user location information and pose information at the next point in time based on a difference in the user location information and pose information between the current point in time and the previous point in time in the artificial intelligence(at least refer to fig. 2-3 and column 2-3, line 65-3. Describes systems described herein may sequentially analyze a pre-existing video (e.g., from a file and/or database) for hand pose information and frame 152 may represent the current frame being analyzed, even if frame 152 is not the latest available frame in video stream 150. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions. Column 5-6, line 63-4, describes: Accordingly, the previous depiction of the multi-segment articulated body system provided in the previous frame may represent a relatively small difference from the present depiction of the multi-segment articulated body system in the present frame. As will be explained in greater detail below, systems described herein may classify this difference to determine whether the difference nevertheless demonstrates rigidity across the previous and present frames).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
Regarding the rejection of claim 8, refer to the motivation of claim 1.
Regarding claim 9, Wan discloses:
Wherein the artificial intelligence predicts changes in user location information and pose information at the current point in time based on 6-degree-of-freedom information for each consecutive frame included in the situation information(at least refer to fig. 2-3 and column 8, line 27-35. Describes such a machine learning model may be trained to estimate the hand pose based on a training corpus of hand images that demonstrate hands in various possible configurations and that collectively demonstrates the freedom of movement provided by the various joints of the human hand. The machine learning model may also be complex enough (including, e.g., involving enough computations) to reliably and accurately estimate hand poses from images. Column 8, line 58-67. Describes furthermore, gesture predictor 330 may operate on the assumption of rigidity of the hand between the most recently determined hand pose and the current hand pose (based, e.g., on the classification from rigid motion classifier 320). Thus, for example, gesture predictor may represent a machine learning classifier trained on temporally sequenced hand pose information as input and producing rigid body movement (e.g., translation and/or rotation of the rigid hand as a whole) as output. Column 18, line 3-7. Describes an artificial-reality system may process data from one or more of these sensors to identify a location of a user, to map the real world, to provide a user with context about real-world surroundings, and/or to perform a variety of other functions).
Wan does not explicitly discloses:
situation information including user location information
Fu teaches:
situation information including user location information, (at least refer to fig. 2 and paragraph 35. Describes the method 220, generates 222 indications of the joint and limb locations in the current frame by refining the initial predictions of the joint and limb locations based on indications of respective joint and limb locations from a previous frame.  According to an embodiment, locations are x-y coordinates in the image. Further, in an embodiment, the unit of the locations, e.g., coordinates, are in pixels. Moreover, in an example embodiment, the previous frame is adjacent in time to the current frame in the video).
Regarding the rejection of claim 9, refer to the motivation of claim 1.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IFEDAYO B ILUYOMADE whose telephone number is (571)270-7118. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Eason can be reached at 5712707230. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/IFEDAYO B ILUYOMADE/Primary Examiner, Art Unit 2624                                                                                                                                                                                                        01/10/2026
Read full office action
Prosecution Timeline

Apr 02, 2025
Application Filed
Jan 10, 2026
Non-Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/843,729
Patent 12596432
Methods Circuits Devices Systems Applications and Functionally Associated Machine Executable Code for Digital Device Display Adjustment
2y 5m to grant Granted Apr 07, 2026
18/433,649
Patent 12586513
DISPLAY DEVICE
2y 5m to grant Granted Mar 24, 2026
18/512,546
Patent 12561026
DISPLAY DEVICE
2y 5m to grant Granted Feb 24, 2026
18/975,423
Patent 12562095
DISPLAY DEVICE
2y 5m to grant Granted Feb 24, 2026
18/935,387
Patent 12554129
EYE IMAGING IN HEAD WORN COMPUTING
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
74%
Grant Probability
83%
With Interview (+9.2%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 630 resolved cases by this examiner. Grant probability derived from career allow rate.
VIDEO STREAMING METHOD AND DEVICE OF USER-CONTEXT-INFORMATION-PREDICTION-BASED EXTENDED REALITY DEVICE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email