Last updated: April 19, 2026
Application No. 18/596,786
OBJECT TRACKING FOR EXTENDED REALITY (XR) APPLICATIONS

Non-Final OA §102§103
Filed
Mar 06, 2024
Examiner
MOTSINGER, SEAN T
Art Unit
2673
Tech Center
2600 — Communications
Assignee
Varjo Technologies OY
OA Round
1 (Non-Final)
Interview Optional

— +11.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 679 resolved cases, 2023–2026
Examiner Intelligence

MOTSINGER, SEAN T View full profile →
Grants 78% — above average
Career Allow Rate
530 granted / 679 resolved
+16.1% vs TC avg
Moderate +11% lift
Without
With
+11.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
28 currently pending
Career history
707
Total Applications
across all art units
Statute-Specific Performance

§101
13.1%
-26.9% vs TC avg
§103
41.5%
+1.5% vs TC avg
§102
18.8%
-21.2% vs TC avg
§112
17.8%
-22.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 679 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Drawings

The drawings are objected in view of 37 CFR 1.84(0): “Legends. Suitable descriptive legends may be used subject to approval by the Office, or may be required by the examiner where necessary for understanding of the drawing. They should contain as few words as possible”: The drawings consist of boxes with number in them which do not assist in understanding in the invention descriptive legends are required for understanding of the drawing.  

Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: imaging module in claim 1 which is interpreted to be a camera (see page 6 column 13-17 of specification).
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1, 8, 9 and 15 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Powers US 20240221216 A1.

Re claim 1 Powers discloses An Extended Reality (XR) device adapted for tracking an object for XR applications (see abstract), the XR device comprising: an imaging module (see paragraph 60 note that the multiple cameras may be used) configured to capture image data of an environment containing the object(  see figure 1 note that the cameras capture images of handheld controllers, see also paragraph 16 “the headset device may include one or more image components or devices to capture image data of the physical environment including image data of each of the hand-held controllers currently being operated by the user”); 

a processor (see paragraph 65-67 note that a processor is used to perform the functions) configured to: 

analyze the image data using a machine learning algorithm to estimate a pose of the object, generating pose estimation data (see paragraph 21 “the headset device may apply a pixel regressor and/or classification model, such as random forest, non-linear regression, neural network (CNN), or other machine learned models, to identify image points (e.g., points in the captured image that may correspond to a point within the constellation on the hand-held controller). The headset device may then determine the pose of the hand-held controller based at least in part on the image points and model points correspondences over the series of frames” note that machine learning is used to determine the pose of a hand-held controller) . 

obtain inertial data corresponding to movements and/or orientations of the object from an Inertial Measurement Unit (IMU) affixed to the object (see paragraph 22);

 fuse the pose estimation data and the inertial data to generate combined tracking data for the object (see paragraph 22 “Accordingly, the hand-held devices may capture or generate IMU data during use. In some examples, each of the hand-held controllers may provide or send the IMU data captured during use to the headset device and the headset device may disambiguate between the pair of controllers based at least in part on the image data of the constellations and/or the IMU data. For instance, the headset device may reduce noise (e.g., eliminate candidate poses) and/or implement constraints on the candidate poses based at least in part on the IMU data” note that the IMU data may also be used to determining the pose);

 and render one or more of position, movement, and orientation of the object within the XR application based on the combined tracking data (see paragraph 55 “For instance, in one specific example, the user 102 may point one or more of the controllers 106 at an object (e.g., a virtual object). The headset device 104 may first identify, based on detecting the pose of the controllers 106 and the visual data displayed to the user 104, that the user 104 is pointed at the object within the virtual environment. The headset device 104 may then perform an operation such as selecting, grabbing, moving, highlighting, or the like the object in response to determining that the user 102 has pointed the controllers 106 at the object” note that the pose of the object may affect the images rendered and displayed); (see paragraph 128 and figure 11 note that pose tracking of the controllers may be visualized)

and a display module for projecting the rendered position, movement, and orientation of the object. (see paragraph 55 “For instance, in one specific example, the user 102 may point one or more of the controllers 106 at an object (e.g., a virtual object). The headset device 104 may first identify, based on detecting the pose of the controllers 106 and the visual data displayed to the user 104, that the user 104 is pointed at the object within the virtual environment. The headset device 104 may then perform an operation such as selecting, grabbing, moving, highlighting, or the like the object in response to determining that the user 102 has pointed the controllers 106 at the object” note that the pose of the object may affect the images rendered and displayed); (see paragraph 128 and figure 11 note that pose tracking of the controllers may be visualized).

Re claim 8 discloses wherein the imaging module employs multiple cameras for capturing the image data (see paragraph 60 note that the multiple cameras may be used).

Re claim 9 Powers discloses A method for tracking an object for Extended Reality (XR) applications, the method comprising (see abstract), the XR device comprising: capturing  image data of an environment containing the object (see figure 1 note that the cameras capture images of handheld controllers, see also paragraph 16 “the headset device may include one or more image components or devices to capture image data of the physical environment including image data of each of the hand-held controllers currently being operated by the user”); 

analyzing the image data using a machine learning algorithm to estimate a pose of the object, generating pose estimation data (see paragraph 21 “the headset device may apply a pixel regressor and/or classification model, such as random forest, non-linear regression, neural network (CNN), or other machine learned models, to identify image points (e.g., points in the captured image that may correspond to a point within the constellation on the hand-held controller). The headset device may then determine the pose of the hand-held controller based at least in part on the image points and model points correspondences over the series of frames” note that machine learning is used to determine the pose of a hand-held controller) . 

obtaining inertial data corresponding to movements and/or orientations of the object from an Inertial Measurement Unit (IMU) affixed to the object (see paragraph 22);

 fusing the pose estimation data and the inertial data to generate combined tracking data for the object (see paragraph 22 “Accordingly, the hand-held devices may capture or generate IMU data during use. In some examples, each of the hand-held controllers may provide or send the IMU data captured during use to the headset device and the headset device may disambiguate between the pair of controllers based at least in part on the image data of the constellations and/or the IMU data. For instance, the headset device may reduce noise (e.g., eliminate candidate poses) and/or implement constraints on the candidate poses based at least in part on the IMU data” note that the IMU data may also be used to determining the pose);

 and rendering one or more of position, movement, and orientation of the object within the XR application based on the combined tracking data (see paragraph 55 “For instance, in one specific example, the user 102 may point one or more of the controllers 106 at an object (e.g., a virtual object). The headset device 104 may first identify, based on detecting the pose of the controllers 106 and the visual data displayed to the user 104, that the user 104 is pointed at the object within the virtual environment. The headset device 104 may then perform an operation such as selecting, grabbing, moving, highlighting, or the like the object in response to determining that the user 102 has pointed the controllers 106 at the object” note that the pose of the object may affect the images rendered and displayed); (see paragraph 128 and figure 11 note that pose tracking of the controllers may be visualized in the display).

Re claim 15 Powers discloses projecting the rendered position, movement, and orientation of the object onto a display of an XR device and rendering one or more of position, movement, and orientation of the object within the XR application based on the combined tracking data (see paragraph 55 “For instance, in one specific example, the user 102 may point one or more of the controllers 106 at an object (e.g., a virtual object). The headset device 104 may first identify, based on detecting the pose of the controllers 106 and the visual data displayed to the user 104, that the user 104 is pointed at the object within the virtual environment. The headset device 104 may then perform an operation such as selecting, grabbing, moving, highlighting, or the like the object in response to determining that the user 102 has pointed the controllers 106 at the object” note that the pose of the object may affect the images rendered and displayed); (see paragraph 128 and figure 11 note that pose tracking of the controllers may be visualized in the display see figure 2).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 2, 7 and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Powers US 20240221216 A1 in view of Emami US 20240281071 A1.

Re claim 2 Powers further disclose wherein the object is a handheld controller for use with the XR device (see abstract  paragraph 18 figure 4 or figure 7 note that the object being tracked is a controller to manipulate a mixed reality environment)  wherein the processor is further configured to: determine an initial position of the handheld controller based on the combined tracking data therefor; and utilizing the initial position as a reference point for subsequent tracking of the handheld controller (see paragraph 110  note that an initial pose is determined and subsequently tracked see also figure 7  see also paragraph 113)  Powers does not expressly disclose, wherein the XR device further comprises a proximity sensor configured to detect the presence of a user's hand relative to the handheld controller. Emami discloses wherein the XR device further comprises a proximity sensor configured to detect the presence of a user's hand relative to the handheld controller (see abstract “ Thus, if the user has a controller in her hand and want to touch a UI element, she can also use her finger, where the system identifies the finger not touching the controller (e.g., via capacitance sensors) and that a hand pose is present (e.g., finger extended—from camera) to allow that finger to provide hand input, without the user having to put down the controller.”). The motivation to combine is to allow that finger to provide hand input, without the user having to put down the controller. One of ordinary skill in the art could have easily added a capacitive sensor to the controller of Powers to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Emami and Powers to reach the aforementioned advantage. 

Re claim 7 Emami discloses wherein the proximity sensor employs one or more of capacitive sensing (see abstract note that capacitive sensing is used.), infrared sensing, or ultrasonic sensing techniques to detect the presence of the user's hand relative to the handheld controller.

Re claim 10 Powers further disclose wherein the object is a handheld controller for use in the XR application (see abstract  paragraph 18 figure 4 or figure 7 note that the object being tracked is a controller to manipulate a mixed reality environment)  determining an initial position of the handheld controller based on the combined tracking data therefor; and utilizing the initial position as a reference point for subsequent tracking of the handheld controller (see paragraph 110  note that an initial pose is determined and subsequently tracked see also figure 7  see also paragraph 113)  Powers does not expressly disclose, detecting the presence of a user's hand in proximity to the handheld controller. Emami discloses detecting the presence of a user's hand in proximity to the handheld controller (see abstract “ Thus, if the user has a controller in her hand and want to touch a UI element, she can also use her finger, where the system identifies the finger not touching the controller (e.g., via capacitance sensors) and that a hand pose is present (e.g., finger extended—from camera) to allow that finger to provide hand input, without the user having to put down the controller.”). The motivation to combine is to allow that finger to provide hand input, without the user having to put down the controller. One of ordinary skill in the art could have easily added a capacitive sensor to the controller of Powers to reach the aforementioned advantage. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Emami and Powers to reach the aforementioned advantage. 

Claim(s) 3 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Powers US 20240221216 A1 in view of Emami US 20240281071 A1 in further view of Kashu US 20250377724 A1.

Re claim 3 Powers and Emami do not expressly discloses  wherein the handheld controller has a predetermined shape and a predetermined button configuration, and wherein the processor is further configured to: correlate one or more of the predetermined shape and the predetermined button configuration of the handheld controller with detected finger positions of the user's hand thereon, generating hand position data; and integrate the hand position data with the combined tracking data, for implementation in tracking of the object.

Kashu discloses wherein the handheld controller has a predetermined shape (see paragraph 39 note that the controller may have a ring shape) and a predetermined button configuration (se paragraph 39 4 and figure 1 note that the button configuration is shows in the image), and wherein the processor is further configured to: correlate one or more of the predetermined shape and the predetermined button configuration of the handheld controller with detected finger positions of the user's hand thereon, generating hand position data (see paragraph 34 note that finger tracking of the hand is used to determine if the a figure is close to the position of a  button on the controller see also paragraph 102 “To determine whether the user is about to perform the gesture for pressing the button, it is determined whether a first joint point 911 that is a point representing a fingertip of the finger with which the button press is performed is present in a neighboring area 910 while a position of the controller 120 is set as a center. In FIGS. 9, the finger with which the button operation is performed is assumed as the thumb.” ); and integrate the hand position data with the combined tracking data, for implementation in tracking of the object (see paragraph 112 note that the position orientation of the controller used in combination with the position orientation information of the finger “in S1001, the control unit 211 determines whether the change in at least one of the position or the orientation of the controller is a change while the wearing finger is set as a central axis. A motion set as a target of this determination will be described by using FIG. 9A. In the determination in S1001, an example of the determination target includes a sliding and rotating motion of the controller on the finger, that is, such a motion for the controller 120 to rotate while the controller wearing finger 313 is set as the central axis. Such an unintended rotating motion includes a case where the thumb 314 erroneously comes into contact with the controller 120 and a case where the thumb 314 slips on a surface of the controller 120 at the time of the button operation. In this case, for the fluctuation of the hand, that is, the motion of the finger on which the controller 120 is worn, it is conceivable that the controller 120 indicates a fluctuation that is independent of the fluctuation of the wearing finger. For example, a case where the controller fluctuates in a rotation direction along the finger even though the finger is not rotated is exemplified. The control unit 211 receives fluctuation information of at least one of the position or the orientation of the controller 120 which is output from the communication unit 223 of the controller 120 and estimation information of at least one of the position or the orientation of the hand finger which is output from the control unit 201 of the HMD 100.” ). The motivation to combine is to obtain accurate information (see paragraph 112). Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Emami with Kashu to reach the aforementioned advantage.

Re claim 11 Powers and Emami do not expressly discloses correlating one or more of a predetermined shape and a predetermined button configuration of the handheld controller with detected finger positions of the user's hand thereon, generating hand position data; and integrating the hand position data with the combined tracking data, for implementation in tracking of the object..

Kashu discloses correlating one or more of a predetermined shape and a predetermined button configuration of the handheld controller with detected finger positions of the user's hand thereon, generating hand position data (see paragraph 34 note that finger tracking of the hand is used to determine if the a figure is close to the position of a  button on the controller see also paragraph 102 “To determine whether the user is about to perform the gesture for pressing the button, it is determined whether a first joint point 911 that is a point representing a fingertip of the finger with which the button press is performed is present in a neighboring area 910 while a position of the controller 120 is set as a center. In FIGS. 9, the finger with which the button operation is performed is assumed as the thumb.”); integrating the hand position data with the combined tracking data, for implementation in tracking of the object. (see paragraph 112 note that the position orientation of the controller used in combination with the position orientation information of the finger “in S1001, the control unit 211 determines whether the change in at least one of the position or the orientation of the controller is a change while the wearing finger is set as a central axis. A motion set as a target of this determination will be described by using FIG. 9A. In the determination in S1001, an example of the determination target includes a sliding and rotating motion of the controller on the finger, that is, such a motion for the controller 120 to rotate while the controller wearing finger 313 is set as the central axis. Such an unintended rotating motion includes a case where the thumb 314 erroneously comes into contact with the controller 120 and a case where the thumb 314 slips on a surface of the controller 120 at the time of the button operation. In this case, for the fluctuation of the hand, that is, the motion of the finger on which the controller 120 is worn, it is conceivable that the controller 120 indicates a fluctuation that is independent of the fluctuation of the wearing finger. For example, a case where the controller fluctuates in a rotation direction along the finger even though the finger is not rotated is exemplified. The control unit 211 receives fluctuation information of at least one of the position or the orientation of the controller 120 which is output from the communication unit 223 of the controller 120 and estimation information of at least one of the position or the orientation of the hand finger which is output from the control unit 201 of the HMD 100.”). The motivation to combine is to obtain accurate information (see paragraph 112). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Emami with Kashu to reach the aforementioned advantage.

Claim(s) 4 and 12  is/are rejected under 35 U.S.C. 103 as being unpatentable over Powers US 20240221216 A1 in view of  Fei et al US 20180285636 A1.

Re claim 4 Powers discloses all the elements of claim 1. Powers does not expressly disclose wherein the imaging module is further configured to capture additional image data related to a user's hand, and wherein the processor is configured to: analyze the additional image data, using a hand-tracking algorithm, to determine a position and/or an orientation of the user's hand, generating hand tracking data; and integrate the hand tracking data with the combined tracking data, for implementation in tracking of the object. Fei discloses wherein the imaging module is further configured to capture additional image data related to a user's hand, and wherein the processor is configured to: analyze the additional image data, using a hand-tracking algorithm, to determine a position and/or an orientation of the user's hand, generating hand tracking data (see abstract note a hand tracking camera module captures images to track the and hand and the system tracks the hand ); and integrate the hand tracking data with the combined tracking data, for implementation in tracking of the object (see paragraph 58 “Fusion algorithm can (1) determine 6DOF object pose (3D position+3D rotation) by fusing 3D position given by marker tracking with 3D rotation given by IMU data, and (2) fuse 6DOF controller pose with 6DOF hand pose to get more accurate 6DOF pose for both the controller and the hand.” Note that hand tracking data and controller tracking data are fused to get a more accurate pose). The motivation to combine is to get more accurate 6DOF pose for both the controller and the hand (see paragraph 58). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Fei to reach the aforementioned advantage. 

Re claim 12 Powers discloses all the elements of claim 9. Powers does not expressly disclose capturing additional image data related to a user's hand; analyzing the additional image data, using a hand-tracking algorithm, to determine a position and/or an orientation of the user's hand, generating hand tracking data; and integrating the hand tracking data with the combined tracking data, for implementation in tracking of the object. Fei discloses capturing additional image data related to a user's hand; analyzing the additional image data, using a hand-tracking algorithm, to determine a position and/or an orientation of the user's hand, generating hand tracking data (see abstract note a hand tracking camera module captures images to track the and hand and the system tracks the hand); and integrating the hand tracking data with the combined tracking data, for implementation in tracking of the object. (see paragraph 58 “Fusion algorithm can (1) determine 6DOF object pose (3D position+3D rotation) by fusing 3D position given by marker tracking with 3D rotation given by IMU data, and (2) fuse 6DOF controller pose with 6DOF hand pose to get more accurate 6DOF pose for both the controller and the hand.” Note that hand tracking data and controller tracking data are fused to get a more accurate pose). The motivation to combine is to get more accurate 6DOF pose for both the controller and the hand (see paragraph 58). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Fei to reach the aforementioned advantage. 

Claim(s) 5, 6, 13 and 14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Powers US 20240221216 A1 in view of Han et al US 20240104749 A1.

Re claim 5 Powers discloses all the features of claim 1. Powers does not expressly disclose wherein the processor is further configured to segment the image data to define region of interest containing the object in the environment, for analysis and feature extraction by the machine learning algorithm. Han discloses herein the processor is further configured to segment the image data to define region of interest containing the object in the environment, for analysis and feature extraction by the machine learning algorithm. (see paragraph 86 17 and 19 note that a neural network may include a region proposal network and a refinement model for object tracking) The motivation to combine is “a method of object tracking that reduces performance speed reduction and reduces computational burden” (see paragraph 12). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Han to reach the aforementioned advantage. 

Re claim 6 Powers further disclose the machine learning algorithm utilizes a convolutional neural network and/or pooling operations to analyze pixel intensities in the image data and extract feature from the image data, for estimating the pose of the object (see paragraph 21 “For instance, in one example, the headset device may receive a series of frames of a hand-held controller having a constellation with active elements. The headset device may apply a pixel regressor and/or classification model, such as random forest, non-linear regression, neural network (CNN), or other machine learned models, to identify image points (e.g., points in the captured image that may correspond to a point within the constellation on the hand-held controller). The headset device may then determine the pose of the hand-held controller based at least in part on the image points and model points correspondences over the series of frames”  note that features (i.e. points) determined by the CNN from the image data) Powers does not expressly disclose the machine learning algorithm utilizes a convolutional neural network and/or pooling operations to analyze pixel intensities in the image data and extract feature vectors from the image data, for estimating the pose of the object. Powers does not expressly disclose feature vectors. Han discloses features vectors (see paragraph 81 or 94 note that 5d vectors are produced by the network). The motivation to combine is “a method of object tracking that reduces performance speed reduction and reduces computational burden” (see paragraph 12). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Han to reach the aforementioned advantage.

Re claim 13 Powers discloses all the features of claim 9. Powers does not expressly disclose segmenting the image data to define region of interest containing the object in the environment, for analysis and feature extraction by the machine learning algorithm. Han discloses herein the processor is further configured to segment the image data to define region of interest containing the object in the environment, for analysis and feature extraction by the machine learning algorithm. (see paragraph 86 17 and 19 note that a neural network may include a region proposal network and a refinement model for object tracking) The motivation to combine is “a method of object tracking that reduces performance speed reduction and reduces computational burden” (see paragraph 12). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Han to reach the aforementioned advantage. 

Re claim 14 Powers further disclose the machine learning algorithm utilizes a convolutional neural network and/or pooling operations to analyze pixel intensities in the image data and extract feature from the image data, for estimating the pose of the object (see paragraph 21 “For instance, in one example, the headset device may receive a series of frames of a hand-held controller having a constellation with active elements. The headset device may apply a pixel regressor and/or classification model, such as random forest, non-linear regression, neural network (CNN), or other machine learned models, to identify image points (e.g., points in the captured image that may correspond to a point within the constellation on the hand-held controller). The headset device may then determine the pose of the hand-held controller based at least in part on the image points and model points correspondences over the series of frames”  note that features (i.e. points) determined by the CNN from the image data) Powers does not expressly disclose the machine learning algorithm utilizes a convolutional neural network and/or pooling operations to analyze pixel intensities in the image data and extract feature vectors from the image data, for estimating the pose of the object. Powers does not expressly disclose feature vectors. Han discloses features vectors (see paragraph 81 or 94 note that 5d vectors are produced by the network). The motivation to combine is “a method of object tracking that reduces performance speed reduction and reduces computational burden” (see paragraph 12). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Powers and Han to reach the aforementioned advantage.

Cited Art
The following is a listing of cited which is relevant but not cited in the above rejection:

Katz US 20160364013 A1 discloses: A virtual reality (VR) system tracks the position of a controller. The VR system includes an image tracking system comprising of a number of fixed cameras, and a headset worn by the user that includes an imaging device to capture images of a controller operated by the user. The controller includes a set of features disposed on the surface of the controller. The image tracking system provides a first view of the controller. The imaging device mounted on the headset provides a second view of the controller. Each view of the controller (i.e., from the headset and from the image tracking system) provides a distinct set of features observed on the controller. The first and second sets of features are identified from the captured images and a pose of the controller is determined using the first set of features and the second set of features. (see abstract)

Chen US 20160378204 A1 discloses:
A system for tracking a first electronic device, such as a handheld electronic device, in a virtual reality environment generated by a second electronic device, such as a head mounted display may include the fusion of data collected by sensors of the electronic device with data collected by sensors of the head mounted display, together with data collected by a front facing camera of the electronic device related to the front face of the head mounted display. (See abstract)

YE US 20230401723 A1 discloses A tracking system and method, comprising a processor; and a controller operably coupled to the processor. Two or more light sources are mounted in a known configuration with respect to each other and with respect to the controller body and the two or more light sources are configured to flash a predetermined time sequence. A dynamic vision sensor is configured to output signals corresponding to two or more events at two or more corresponding light-sensitive elements in an array in response to changes in light output from the two or more light sources and corresponding to times of the two or more events and locations of the events in the light-sensitive elements in the array. The processor determines a position and orientation of the controller from the times and location of the two or more events, and the known information. (See abstract)

Canberk US 20240050856 A1 discloses :

Systems and methods are provided for using an external controller with an AR device. The system establishes, by one or more processors of the AR device, a communication with an external client device. The system overlays, by the AR device, a first AR object on a real-world environment being viewed using the AR device. The system receives interaction data from the external client device representing one or more inputs received by the external client device and, in response, modifies the first AR object by the AR device. (See abstract)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEAN T MOTSINGER whose telephone number is (571)270-1237. The examiner can normally be reached 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chineyere Wills-Burns can be reached at (571) 272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SEAN T MOTSINGER/Primary Examiner, Art Unit 2673
Read full office action
Prosecution Timeline

Mar 06, 2024
Application Filed
Jan 09, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/217,180
Patent 12592001
EXCREMENT DETERMINATION METHOD, EXCREMENT DETERMINATION DEVICE, AND NON-TRANSITORY COMPUTER READABLE RECORDING MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/172,191
Patent 12586198
IMAGE ANALYSIS FOR IDENTIFYING OBJECTS AND CLASSIFYING BACKGROUND EXCLUSIONS
2y 5m to grant Granted Mar 24, 2026
17/636,629
Patent 12573223
Hand-Drawn Graphic Recognition Method, Apparatus and System, and Computer-Readable Storage Medium
2y 5m to grant Granted Mar 10, 2026
18/518,236
Patent 12567149
METHOD AND APPARATUS FOR TRAINING IMAGE PROCESSING MODEL, AND STORAGE MEDIUM STORING INSTRUCTIONS TO PERFORM METHOD FOR TRAINING IMAGE PROCESSING MODEL
2y 5m to grant Granted Mar 03, 2026
18/198,562
Patent 12555397
METHOD AND APPARATUS FOR DECHIPERING OBFUSCATED TEXT FOR CYBER SECURITY
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
90%
With Interview (+11.4%)
2y 10m
Median Time to Grant
Low
PTA Risk
Based on 679 resolved cases by this examiner. Grant probability derived from career allow rate.