Last updated: May 29, 2026
Application No. 18/339,073
Latency Correction for a Camera Image

Final Rejection §103
Filed
Jun 21, 2023
Priority
Aug 15, 2022 — provisional 63/398,024
Examiner
DICKERSON, CHAD S
Art Unit
2683
Tech Center
2600 — Communications
Assignee
Apple Inc.
OA Round
2 (Final)
This examiner grants 63% of cases after interview

— +23.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 600 resolved cases, 2023–2026
Examiner Intelligence

DICKERSON, CHAD S View full profile →
Grants 63% of resolved cases
Career Allowance Rate
376 granted / 600 resolved
+0.7% vs TC avg
Strong +23% interview lift
Without
With
+23.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
24 currently pending
Career history
638
Total Applications
across all art units
Statute-Specific Performance

§101
0.4%
-39.6% vs TC avg
§103
93.9%
+53.9% vs TC avg
§102
3.4%
-36.6% vs TC avg
§112
1.7%
-38.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 600 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 12/18/2025 have been fully considered but they are not persuasive. The arguments regarding the specification objection is not persuasive.  The title does not capture what is claimed in the independent claims.  This is suggested below in order to capture how the invention arrives to the correction.

Applicant’s arguments with respect to claim(s) 1-6, 8-14, 16-22 and 24 have been considered but are moot because the new ground of rejection does not rely on all references applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.  The remarks state that the reference applied does not perform the features of “determining, based on the first and second poses of the electronic device, a position for a representation of the image within a three-dimensional (3D) environment, wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose”.  The reference of Pekelny in combination with the previous reference performs the features of the claims and will be explained below.
Regarding the limitation regarding “determining, based on the first and second poses …”, the primary reference is used to still perform this feature.  For example, in figure 8, sensors on both sides of the glasses are used to capture an image of an object in the physical environment.  The system detects position and user poses, or position and orientation, towards the object, and uses these positions to reconstruct a 3D mesh representation of the object within the 3D environment.  This is taught in ¶ [33]-[39].  However, this reference is not specific in detailing an image representation being used to overlay a 3D environment when viewed from another pose.  This is cured by the Pekelny reference. 
Regarding the Pekelny reference, this art discloses a user HMD sensing a physical environment person or object.  This person or object is then overlaid onto a 3D environment and a gaze, or pose, is tracked in order to overlay the object or person onto a 3D environment in a different manner.  This is taught in ¶ [40]-[43], [47], [48], [117], [162] and [163].  The gaze tracking in this reference, or the position and orientation tracking of the user in the primary reference combined with the secondary reference would result in changing an image reconstruction to reflect the user’s pose toward an object once the physical object is overlaid onto the 3D environment.  Therefore, based on the above, the combination of references discloses the limitation of “wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose”. 
Thus, based on the above, the features of the claims are disclosed below.

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 
The following title is suggested: 
Latency Correction for a Camera Image COMPRISING DETERMINING A POSITION FOR A REPRESENTATION OF AN IMAGE WITHIN A 3D ENVIRONMENT BASED ON A FIRST POSE SENSED AND PRESENTING A VIEW OF A 3D ENVIRONMENT ON A DISPLAY BASED ON A SECOND POSE.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-5, 9-13, 17-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scavezze (US Pub 2017/0230641) in view of Pekelny (US Pub 2020/0026922).

Re claim 1: Scavezze discloses an electronic device comprising: 
one or more sensors; one or more displays; one or more processors; and memory storing instructions configured to be executed by the one or more processors (e.g. the system discloses multiple sensors to captures areas, a head mounted display used to display information and processor with a memory to execute the invention, which is taught in ¶ [52] and [53].), the instructions for: 

[0052] A further example includes a device operative to perform object scanning using sensor fusion, comprising: an outward-facing image sensor operative to capture images of a scene in a space; a position sensor operative to detect one or more of a position, motion, or orientation of the device within the space; one or more processors; a data storage system, operative to store images from the outward-facing image sensor, and to store position, motion, or orientation data from the position sensor; and a machine-readable memory device operative to store instructions, which when executed cause the one or more processors to capture a plurality of images of the scene from respective positions within the space, detect a position, motion, or orientation of the device within the space simultaneously with the capture of each of the plurality of images of the scene, discard one or more of the plurality of captured images based on the detected position, motion, or orientation of the device at a respective capture location.

[0053] In another example, the outward facing image sensor comprises at least one of a two-dimensional image sensor, a stereoscopic image sensor, and a depth sensor. In another example, the device further comprises a user interface providing at least one of an auditory, visual, or haptic feedback to a user and being responsive to verbal, tactile, or gestural input by the user. In another example, the position sensor comprises one of tracking camera, inertia sensor, magnetic 6-degrees-of-freedom position sensor; a lighthouse-based laser-scanning system, and synchronized photodiodes on the object being tracked. In another example, the device is incorporated in a head mounted display device. In another example, the instructions cause the one or more processors to construct a three-dimensional model using captured images other than the discarded images. In another example, the instructions cause the one or more processors to utilize one or more of the detected position, motion, or orientation of the device as an initial condition for determining a transform of the captured images. In another example, the device further comprises an extended field of view (FOV) image sensor having an FOV that exceeds the image capture sensor in which the extended FOV image sensor is configured to determine poses for the captured images.

capturing, using a first subset of the one or more sensors, an image of a physical environment (e.g. outward facing sensors can be used to capture the physical environment being viewed by a user, which is taught in ¶ [18] and [19].); 

[0018] A virtual reality or mixed reality display device may take any suitable form, including but not limited to near-eye devices such as the HMD device 104 and/or other portable/mobile devices. FIG. 3 shows one particular illustrative example of a see-through, mixed reality display system 300, and FIG. 4 shows a functional block diagram of the system 300. However, it is emphasized that while a see-through display may be used in some implementations, an opaque (i.e., non-see-through) display using a camera-based pass-through or outward facing sensor, for example, may be used in other implementations.

[0019] Display system 300 comprises one or more lenses 302 that form a part of a see-through display subsystem 304, such that images may be displayed using lenses 302 (e.g. using projection onto lenses 302, one or more waveguide systems incorporated into the lenses 302, and/or in any other suitable manner). Display system 300 further comprises one or more outward-facing image sensors 306 configured to acquire images of a background scene and/or physical environment being viewed by a user, and may include one or more microphones 308 configured to detect sounds, such as voice commands from a user. Outward-facing image sensors 306 may include one or more depth sensors and/or one or more two-dimensional image sensors. In alternative arrangements, as noted elsewhere herein, a virtual reality or mixed reality display system, instead of incorporating a see-through display subsystem, may display mixed reality images through a viewfinder mode for an outward-facing image sensor.

obtaining, using a second subset of the one or more sensors, a first pose of the electronic device, wherein the first pose is associated with the capturing of the image of the physical environment (e.g. the system contains motion sensors that are used to detect the user’s head position/orientation that can be used with the with outward-facing image data representative of the 3D environment, which is taught in ¶ [23].); 

[0023] The display system 300 may further include one or more motion sensors 318 (e.g., inertial, multi-axis gyroscopic or acceleration sensors) to detect movement and position/orientation/pose of a user's head when the user is wearing the system as part of an augmented reality HMD device. Motion data may be used, potentially along with eye-tracking glint data and outward-facing image data, for gaze detection, as well as for image stabilization to help correct for blur in images from the outward-facing image sensor(s) 306. The use of motion data may allow changes in gaze location to be tracked even if image data from outward-facing image sensor(s) 306 cannot be resolved.

obtaining, using the second subset of the one or more sensors, a second pose of the electronic device that is different than the first pose (e.g. the sensors capture multiple poses of a user, which can be considered as a second pose.  This is taught in ¶ [33]-[38].); 

[0033] In an illustrative example, scanning is performed with a combination of multiple 2D images of an object in order to form a 3D mesh or other computational model representing the scanned object. For example, identifiable feature points on the object are located in the various views. The change of position of the feature points from one 2D image to another and the change of position between the various feature points within successive 2D images can be used to infer the location of the feature points, and therefore the surface of the object, in three dimensions.

[0034] Positional data describing the location and orientation of the HMD device 104 is used in pose estimation 704. For example, position and orientation data can be derived by sensor package 505, among which can include motion sensor(s) 318, and/or GPS subsystem 316. Furthermore, sensor data such as position data, image data (including 2D and 3D depth image data), can include timestamp metadata. Therefore, sensor data of various types (e.g., image, position, and/or motion) can be correlated in time.

[0035] Data provided by motion sensors 318 may be used to provide hints on how to combine the images. However, data provided by motion sensors 318, for example an IMU, alone is often not robust or accurate, as noted above. In an illustrative implementation, the position, orientation, and rotation data from any of the sensor package 505 components is used as an initial starting point to perform the position integration based on the variety of 2D images, as described above. Accordingly, the entire position computation is completed faster and more efficiently by use of the position information.

[0036] In some implementations, the capture of the 2D images for the 3D computational model can be improved by only capturing images at optimal times, in view of certain motion, position, and/or orientation data. For example, in the case where a 2D image is captured by a rolling shutter camera, a higher quality image is obtained when the camera is not in motion because distortion or blurring is avoided. Additionally, in certain low-light situations, the exposure duration may be longer to achieve adequate brightness of image. In the low-light case as well, there will be less blur in the 2D image when the camera is not moving or is moving more slowly. A maximum threshold of acceptable motion can be set to determine an acceptable image. Alternatively, a threshold can be determined by comparison of motion sensor 318 data contemporaneous to image capture data, which can be used to choose the images with a lowest relative contemporaneous motion among several. The chosen 2D images will tend to be of higher quality, acuity and/or sharpness.

[0037] FIG. 8 shows an illustrative example in which the user 102 operates the HMD device 104 to capture a plurality of images of an object 802 in the real world environment 200 of the user 102. The image data can be captured by the sensor package 505, for example, using image sensors 306 and be used as the basis to construct a 3D mesh representation of the object 802 for incorporation and use in the virtual environment 100. Moreover, the HMD device 104 may guide or direct the user 102 how to move in relation to the object 802 in order to capture better input images, for example, through the user interface 630.

[0038] In some implementations, images are selected to use in the 3D model construction based on position and orientation information derived from the sensor package, for example motion sensors 318. More particularly, images that are taken from positions or vantage points, generally 804, or individually 804a, 804b, 804c, etc. can be utilized. The positions 804 of the images used in the model construction are spaced from one another. In some cases, the positions 804 may be evenly spaced around the object 802, or as near to even spacing as can be obtained based on the position metadata accompanying a plurality of images including the object 802. Moreover, the combination of position 804 and orientation of the HMD device 104 with respect to the object 802 is considered a pose, indicated in FIG. 8 by one of arrows 806a, 806b, or 806c. Evenly spaced, regular poses can yield a better quality of synthesis of the resulting 3D mesh that models the object 802, due at least in part to similar error characteristics amongst generated depth maps.

[0039] The scanning process for the object 802 may thus be improved using knowledge of the camera location for each captured image, by a precise position and orientation of the camera. Constructing a 3D model using fewer images, while having those approximately evenly spaced, may result in decreased processing time and memory consumption, which can improve the overall performance of the HMD device 104. Additionally, knowing a previously optimized location and orientation of the camera relative to a specific coordinate frame shared with a subsequent pose, or orientation and location of the camera, provides a starting point for optimizing the relative transform between stereo image pairs. For example, if one minimizes the global error of the system, it can be at the expense of error between individual poses. These poses act as a seed for the optimization problem using just the salient data for 3D object reconstruction. This knowledge may help prevent sparse features causing optimization into spurious local minima. The result can be more precise and accurate image-to-image pose calculations, or even reference for the rejection of outlier data. Accordingly, less iteration is needed to reach the correct minimum.

determining, based on the first and second poses of the electronic device, a position for a representation of the image within a three-dimensional (3D) environment (e.g. the head position and orientation can be tracked in order to change the view of the environment, which is taught in ¶ [17].  The first and second pose can be used to reconstruct a pose for the 3D environment shown, which is taught in ¶ [33]-[39] above.),  

[0017] Users can explore, navigate, and move within a mixed reality or virtual reality environment rendered by a head mounted display (HMD) device by moving (e.g., through some form of locomotion) within a corresponding real world, physical environment or space. In an illustrative example, as shown in FIG. 1, a user 102 can employ an HMD device 104 to experience a virtual reality environment 100 that is rendered visually in three dimensions (3D) and may include audio and/or tactile/haptic sensations in some implementations. In this particular non-limiting example, an application executing on the HMD device 104 supports a virtual reality environment 100 that includes city streets with various buildings, stores, etc. As the user changes the position or orientation of his head and/or moves within the physical real world environment 200 shown in FIG. 2, his view of the virtual reality environment 100 can change. The field of view (represented by the dashed area 110 in FIG. 1) can be sized and shaped and other characteristics of the device can be controlled to make the HMD device experience visually immersive to provide the user with a strong sense of presence in the virtual world. While a virtual reality environment is shown in FIG. 1 and described herein, the presently described principles can also be applied to mixed reality environments and scenarios.

presenting, using the one or more displays, a view of the 3D environment based on the second pose of the electronic device, wherein the view comprises the representation of the image at the determined position (e.g. when the post of the user’s head mounted display changes, the view of the environment changes to reflect the different view, which is taught in ¶ [17], [19] and [23] above.).  
However, Scavezza fails to specifically teach the features of wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose.
However, this is well known in the art as evidenced by Pekelny.  Similar to the primary reference, Pekelny discloses displaying a virtual representation onto a 3D environment (same field of endeavor or reasonably pertinent to the problem).     
Pekelny discloses wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose (e.g. the reference discloses capturing an image of an environmental object, such as a person or object, and overlays that person within a 3D virtual environment.  This is seen in figures 1, 2 and 5.  This is explained in ¶ [40]-[43], [47], [48], [162] and [163].  The system detects the gaze of the user to impact the image in front of the user, which is taught in ¶ [117].).

[0040] In the case of FIG. 2, assume that the user 104 has previously indicated that he wishes to be alerted to the existence of other people in the physical environment 102 when the user 104 is immersed in the virtual environment 202. This makes the presence of any person other than the user 104 an object-of-interest. In the scenario of FIG. 1, the physical environment 102 does in fact include a person 108 in front of the user 104. The SPC detects this person 108 and then presents alert information 204 which notifies the user 104 of the existence of the other person 108. In this case, the alert information 204 may include a visual representation of the surface of the other person's body. Without limitation, in one example, the SPC can generate this kind of alert information using any three-dimensional reconstruction algorithm (e.g., the marching cubes algorithm) based on depth sensor readings provided by the VR device 106.

[0041] In one implementation, the alert information 204 that the SPC displays is a direct representation of the appearance of the other person 108. In another implementation, the SPC can display alert information that includes at least some proxy virtual content. Proxy virtual content corresponds to any information presented in the virtual environment 202 that is used to depict a physical object in the physical environment 102, but where that information represents some modification to the actual appearance of the physical object.

[0042] In one case, the SPC can present proxy virtual content that entirely replaces a direct representation of a physical object in in the physical environment 102. For example, the SPC can replace a direct representation of the other person 108 with a simplified avatar (such as a skeleton representation of the other person 108 in his current pose), a fanciful avatar (such as a gladiator, wizard, a cartoon figure, etc.), or even a representation of another actual person. The SPC can perform the same operation with respect to any physical object, e.g., by replacing a representation of an actual chair with another chair having a different style, a representation of an actual pet (e.g., a cat) with another kind of animal (e.g., a leopard), and so on.

[0043] Alternatively, or in addition, the SPC may present proxy virtual content which only supplements a direct representation of the other person 108. For example, the SPC can place a virtual hat 206 on the head of the other person 108 or a virtual lei around his neck (not shown). This virtual hat 206 constitutes virtual content because the actual person 108 is not wearing a hat. Or the SPC can replace a detected image associated with the surface of a physical object with a new image, essentially pasting the new image onto a representation of the surface of the physical object. For instance, the SPC can use this effect to change the actual color of the person's shirt to another color. Likewise, the SPC can change an actual single-color interior wall to a cliff face or a wall having a brick veneer. In another example, the SPC can display a virtual object next to the detected object, such as by showing a strobing exclamation point that appears to float in the air in close proximity to any representation of the other person 108.


[0047] In FIG. 5, assume that the user 104 has alternatively specified that computing devices correspond to objects-of-interest. Based on this configuration, the SPC presents alert information (502, 504) that respectively identifies the location of the two computing devices (116, 118) in the physical environment 102.

[0048] Note that FIGS. 2-5 correspond to examples in which the user 104 has only designated one kind of object-of-interest, e.g., by choosing people in FIGS. 2 and 3, walls in FIG. 4, and computing devices in FIG. 5. But the user 104 may alternatively choose two or more kinds of objects-of-interest, e.g., by requesting alert information for both walls and people. Further, as will be described below in Subsection A.2, the user 104 can specify the conditions or circumstance in which the SPC generates alert information. For instance, the user 104 can instruct the SPC to only present the alert information for a person when the user 104 is within 2 meters of that person.

[0117] FIG. 12 shows one implementation of the virtual reality (VR) device 106 introduced above. In this case, the VR device 106 corresponds to a head-mounted display (HMD). The VR device 106 includes one or more environment-sensing devices 712 mentioned above for providing environment input information, including, but not limited to: one or more environment-facing video cameras (described above); an environment-facing depth camera system (described above); a gaze-tracking system; an inertial measurement unit (IMU); one or more microphones (and an associated voice recognition system), etc. In one implementation, the IMU can determine the movement of the VR device 106 in six degrees of freedom. The IMU can include one or more accelerometers, one or more gyroscopes, one or more magnetometers, etc., or any combination thereof. The (optional) gaze-tracking system can determine the position of the user's eyes and/or head. The gaze-tracking system can determine the position of the user's eyes, by projecting light onto the user's eyes, and measuring the resultant glints that are reflected from the user's eyes.

[0162] According to a thirteenth aspect, dependent on the twelfth aspect, the proxy virtual content is an avatar that duplicates a detected pose of a human being in the physical environment.

[0163] According to a fourteenth aspect, the alert-mode information specifies that a point cloud or reconstructed three-dimensional surface associated with a detected object is to be overlaid on the virtual environment.

Therefore, in view of Pekelny, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention was made to have the feature of wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose, incorporated in the device of Scavezze, in order to represent an object within a three-dimensional environment to show a physical environment within a virtual application, which reduce the safety hazard of a HMD while providing information to the user without extraneous information given to the user (as stated in Pekelny ¶ [01]-[03]).   

Re claim 2: Scavezze discloses the electronic device defined in claim 1, wherein the 3D environment is the physical environment (e.g. the environment determined is a 3D environment, which is taught in ¶ [17] above and [30].).  

[0030] As shown in FIG. 6, the sensor package 505 can support various functionalities including surface reconstruction 610. Surface reconstruction may be utilized, for example, in constructing a virtual 3D model of subjects/objects, a physical environment, or portions thereof. Surface reconstruction may also be utilized, in some applications, for world and/or head tracking to determine the 3D (three-dimensional) position and orientation 615 of the user's head within the physical real world environment 200 including head pose so that a view position of the virtual world can be determined. In some cases, surface reconstruction may be utilized for world tracking by supplementing other head tracking techniques which use, for example, inertial sensors. World tracking using surface reconstruction or other camera-based techniques with tracking cameras and similar sensors can be utilized to determine world location and/or world rotation of the HMD device within the physical environment 200 that is utilized as supplemental information. World tracking can also be determined using other sensors, or combination of sensors using fusion in some cases, although inertial sensor data from an inertial measurement unit (IMU) can be inaccurate in some cases when used alone. Non-limiting examples of these include a magnetic 6-degrees-of-freedom position sensor, a lighthouse-based laser-scanning system that sweeps the room, or photodiodes on the tracked object being triggered at specific moments in time, allowing the pose to be calculated.

Re claim 3: Scavezze discloses the electronic device defined in claim 1, wherein the first pose of the electronic device is incorporated as metadata in the image of the physical environment (e.g. the position and the orientation data can be derived from the sensors and included in metadata that can accompany image data, which is taught in ¶ [34], [38], [54] and [58].) and wherein the instructions further comprise instructions for: 

[0034] Positional data describing the location and orientation of the HMD device 104 is used in pose estimation 704. For example, position and orientation data can be derived by sensor package 505, among which can include motion sensor(s) 318, and/or GPS subsystem 316. Furthermore, sensor data such as position data, image data (including 2D and 3D depth image data), can include timestamp metadata. Therefore, sensor data of various types (e.g., image, position, and/or motion) can be correlated in time.

[0038] In some implementations, images are selected to use in the 3D model construction based on position and orientation information derived from the sensor package, for example motion sensors 318. More particularly, images that are taken from positions or vantage points, generally 804, or individually 804a, 804b, 804c, etc. can be utilized. The positions 804 of the images used in the model construction are spaced from one another. In some cases, the positions 804 may be evenly spaced around the object 802, or as near to even spacing as can be obtained based on the position metadata accompanying a plurality of images including the object 802. Moreover, the combination of position 804 and orientation of the HMD device 104 with respect to the object 802 is considered a pose, indicated in FIG. 8 by one of arrows 806a, 806b, or 806c. Evenly spaced, regular poses can yield a better quality of synthesis of the resulting 3D mesh that models the object 802, due at least in part to similar error characteristics amongst generated depth maps.

[0054] A further example includes a machine-readable memory device operative to store instructions which, when executed by one or more processors disposed in an electronic device, cause the electronic device to: perform object scanning by capturing a plurality of images of an object from a respective plurality of vantage points using a first camera disposed in the electronic device; determine object poses for the scanning using a second camera disposed in the electronic device that has an extended field of view relative to the first camera; generate world tracking metadata for the electronic device at each vantage point; and utilize the world tracking metadata to combine a subset of the plurality of captured images into a three-dimensional model of the object.

[0055] In another example, the first camera has higher angular resolution or is configured to capture an increased level of detail relative to the second camera and the tracking metadata is generated using one or more of tracking camera or inertia sensor incorporated in the electronic device. In another example, the instructions cause the electronic device to generate depth maps from captured images for each vantage point. In another example, the instructions cause the electronic device to operate the first camera to capture images at evenly-spaced vantage points so as to minimize differences in error characteristics in the generated depth maps.

before determining the position for the representation of the image, extracting the first pose of the electronic device from the metadata in the image of the physical environment (e.g. the metadata capturing the position and orientation of the head mounted device can be used to create a three-dimensional model of the image data, which is taught in ¶ [34], [38], [54] and [58] above.).  

Re claim 4: Scavezze discloses the electronic device defined in claim 1, wherein the instructions further comprise instructions for: storing time-stamped poses of the electronic device for a given duration of time in a buffer (e.g. the sensor data can include timestamp metadata, which is associated with time stamped positional data correlated with pose estimation, which is taught in ¶ [34] above. With different events associated with a timestamp, a duration of time can be represented.).  

Re claim 5: Scavezze discloses the electronic device defined in claim 4, wherein a time stamp for the image is incorporated as metadata in the image of the physical environment (e.g. an image includes a metadata with a timestamp that is associated with an image, which is taught in ¶ [34], [38], [54] and [58] above.) and wherein the instructions further comprise instructions for, before determining the position for the representation of the image: 
extracting the time stamp for the image from the metadata in the image of the physical environment (e.g. the image data contains metadata with time stamp information that can be extracted to generate a three-dimensional environment, which is taught in [34], [38], [54] and [58] above.); and 
extracting the first pose of the electronic device from the buffer using the time stamp (e.g. the pose estimation can be associated with a timestamp metadata that is extracted from the image, which is taught in [34], [38], [54] and [58] above.).  

Re claim 9: Scavezze discloses a method of operating an electronic device that comprises one or more sensors and one or more displays, the method comprising: 
capturing, using a first subset of the one or more sensors, an image of a physical environment (e.g. outward facing sensors can be used to capture the physical environment being viewed by a user, which is taught in ¶ [18] and [19] above.); 
obtaining, using a second subset of the one or more sensors, a first pose of the electronic device, wherein the first pose is associated with the capturing of the image of the physical environment (e.g. the system contains motion sensors that are used to detect the user’s head position/orientation that can be used with the with outward-facing image data representative of the 3D environment, which is taught in ¶ [23] above.); 
obtaining, using the second subset of the one or more sensors, a second pose of the electronic device that is different than the first pose (e.g. the sensors capture multiple poses of a user, which can be considered as a second pose.  This is taught in ¶ [33]-[38] above.); 
determining, based on the first and second poses of the electronic device, a position for a representation of the image within a three-dimensional (3D) environment (e.g. the head position and orientation can be tracked in order to change the view of the environment, which is taught in ¶ [17] above.  The first and second pose can be used to reconstruct a pose for the 3D environment shown, which is taught in ¶ [33]-[39] above.); and 

presenting, using the one or more displays, a view of the 3D environment based on the second pose of the electronic device, wherein the view comprises the representation of the image at the determined position (e.g. when the post of the user’s head mounted display changes, the view of the environment changes to reflect the different view, which is taught in ¶ [17], [19] and [23] above.).  

However, Scavezza fails to specifically teach the features of wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose.
However, this is well known in the art as evidenced by Pekelny.  Similar to the primary reference, Pekelny discloses displaying a virtual representation onto a 3D environment (same field of endeavor or reasonably pertinent to the problem).     
Pekelny discloses wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose (e.g. the reference discloses capturing an image of an environmental object, such as a person or object, and overlays that person within a 3D virtual environment.  This is seen in figures 1, 2 and 5.  This is explained in ¶ [40]-[43], [47], [48], [162] and [163] above.  The system detects the gaze of the user to impact the image in front of the user, which is taught in ¶ [117] above.).

Therefore, in view of Pekelny, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention was made to have the feature of wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose, incorporated in the device of Scavezze, in order to represent an object within a three-dimensional environment to show a physical environment within a virtual application, which reduce the safety hazard of a HMD while providing information to the user without extraneous information given to the user (as stated in Pekelny ¶ [01]-[03]).   

Re claim 10: Scavezze discloses the method defined in claim 9, wherein the 3D environment is the physical environment (e.g. the environment determined is a 3D environment, which is taught in ¶ [17] above and [30].).  

Re claim 11: Scavezze discloses the method defined in claim 9, wherein the first pose of the electronic device is incorporated as metadata in the image of the physical environment (e.g. the position and the orientation data can be derived from the sensors and included in metadata that can accompany image data, which is taught in ¶ [34], [38], [54] and [58] above.) and wherein the method further comprises: 
before determining the position for the representation of the image, extracting the first pose of the electronic device from the metadata in the image of the physical environment (e.g. the metadata capturing the position and orientation of the head mounted device can be used to create a three-dimensional model of the image data, which is taught in ¶ [34], [38], [54] and [58] above.).  

Re claim 12: Scavezze discloses the method defined in claim 9, further comprising: storing time-stamped poses of the electronic device for a given duration of time in a buffer (e.g. the sensor data can include timestamp metadata, which is associated with time stamped positional data correlated with pose estimation, which is taught in ¶ [34] above. With different events associated with a timestamp, a duration of time can be represented.).  

Re claim 13: Scavezze discloses the method defined in claim 12, wherein a time stamp for the image is incorporated as metadata in the image of the physical environment (e.g. an image includes a metadata with a timestamp that is associated with an image, which is taught in ¶ [34], [38], [54] and [58] above.) and wherein the method further comprises, 
before determining the position for the representation of the image: 
extracting the time stamp for the image from the metadata in the image of the physical environment (e.g. the image data contains metadata with time stamp information that can be extracted to generate a three-dimensional environment, which is taught in [34], [38], [54] and [58] above.); and 
extracting the first pose of the electronic device from the buffer using the time stamp (e.g. the pose estimation can be associated with a timestamp metadata that is extracted from the image, which is taught in [34], [38], [54] and [58] above.).  

Re claim 17: Scavezze discloses a non-transitory computer-readable storage medium storing one or more programs configured to be executed by one or more processors of an electronic device that comprises one or more sensors and one or more displays (e.g. the system discloses multiple sensors to captures areas, a head mounted display used to display information and processor with a memory to execute the invention, which is taught in ¶ [52] and [53] above.), the one or more programs including instructions for: 
capturing, using a first subset of the one or more sensors, an image of a physical environment (e.g. outward facing sensors can be used to capture the physical environment being viewed by a user, which is taught in ¶ [18] and [19] above.); 
obtaining, using a second subset of the one or more sensors, a first pose of the electronic device, wherein the first pose is associated with the capturing of the image of the physical environment (e.g. the system contains motion sensors that are used to detect the user’s head position/orientation that can be used with the with outward-facing image data representative of the 3D environment, which is taught in ¶ [23] above.); 
obtaining, using the second subset of the one or more sensors, a second pose of the electronic device that is different than the first pose (e.g. the sensors capture multiple poses of a user, which can be considered as a second pose.  This is taught in ¶ [33]-[38] above.); 
determining, based on the first and second poses of the electronic device, a position for a representation of the image within a three-dimensional (3D) environment (e.g. the head position and orientation can be tracked in order to change the view of the environment, which is taught in ¶ [17] above.  The first and second pose can be used to reconstruct a pose for the 3D environment shown, which is taught in ¶ [33]-[39] above.); and 
presenting, using the one or more displays, a view of the 3D environment based on the second pose of the electronic device, wherein the view comprises the representation of the image at the determined position (e.g. when the post of the user’s head mounted display changes, the view of the environment changes to reflect the different view, which is taught in ¶ [17], [19] and [23] above.).  

However, Scavezza fails to specifically teach the features of wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose.
However, this is well known in the art as evidenced by Pekelny.  Similar to the primary reference, Pekelny discloses displaying a virtual representation onto a 3D environment (same field of endeavor or reasonably pertinent to the problem).     
Pekelny discloses wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose (e.g. the reference discloses capturing an image of an environmental object, such as a person or object, and overlays that person within a 3D virtual environment.  This is seen in figures 1, 2 and 5.  This is explained in ¶ [40]-[43], [47], [48], [162] and [163] above.  The system detects the gaze of the user to impact the image in front of the user, which is taught in ¶ [117] above.).

Therefore, in view of Pekelny, it would have been obvious to one of ordinary skill before the effective filing date of the claimed invention was made to have the feature of wherein the position for the representation of the image is configured to cause the representation of the image to overlay a corresponding portion of the 3D environment when viewed from the second pose, incorporated in the device of Scavezze, in order to represent an object within a three-dimensional environment to show a physical environment within a virtual application, which reduce the safety hazard of a HMD while providing information to the user without extraneous information given to the user (as stated in Pekelny ¶ [01]-[03]).   

Re claim 18: Scavezze discloses the non-transitory computer-readable storage medium defined in claim 17, wherein the three-dimensional environment is the physical environment (e.g. the environment determined is a 3D environment, which is taught in ¶ [17] and [30] above.).  

Re claim 19: Scavezze discloses the non-transitory computer-readable storage medium defined in claim 17, wherein the first pose of the electronic device is incorporated as metadata in the image of the physical environment (e.g. the position and the orientation data can be derived from the sensors and included in metadata that can accompany image data, which is taught in ¶ [34], [38], [54] and [58] above.) and wherein the instructions further comprise instructions for: 
before determining the position for the representation of the image, extracting the first pose of the electronic device from the metadata in the image of the physical environment (e.g. the metadata capturing the position and orientation of the head mounted device can be used to create a three-dimensional model of the image data, which is taught in ¶ [34], [38], [54] and [58] above.).  

Re claim 20: Scavezze discloses the non-transitory computer-readable storage medium defined in claim 17, wherein the instructions further comprise instructions for: storing time-stamped poses of the electronic device for a given duration of time in a buffer (e.g. the sensor data can include timestamp metadata, which is associated with time stamped positional data correlated with pose estimation, which is taught in ¶ [34] above. With different events associated with a timestamp, a duration of time can be represented.).  

Re claim 21: Scavezze discloses the non-transitory computer-readable storage medium defined in claim 20, wherein a time stamp for the image is incorporated as metadata in the image of the physical environment (e.g. an image includes a metadata with a timestamp that is associated with an image, which is taught in ¶ [34], [38], [54] and [58] above.) and wherein the instructions further comprise instructions for, before determining the position for the representation of the image: 
extracting the time stamp for the image from the metadata in the image of the physical environment (e.g. the image data contains metadata with time stamp information that can be extracted to generate a three-dimensional environment, which is taught in [34], [38], [54] and [58] above.); and 
extracting the first pose of the electronic device from the buffer using the time stamp (e.g. the pose estimation can be associated with a timestamp metadata that is extracted from the image, which is taught in [34], [38], [54] and [58] above.).  

Claim(s) 6, 14 and 22 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scavezze, as modified by Pekelny, as applied to claims 1, 9 and 17 above, and further in view of Brimijoin (USP 11102602).

Re claim 6: Scavezze discloses the electronic device defined in claim 4, wherein the instructions further comprise instructions for: before determining the position for the representation of the image, extracting the first pose of the electronic device from the buffer (e.g. a pose is estimated that is determined from the receipt of raw data in order to determine the position for representation of the depth within maps, which is taught in ¶ [32].).

[0032] FIG. 7 shows an illustrative surface reconstruction data pipeline 700 for obtaining surface reconstruction data for the real world environment 200. It is emphasized that the disclosed technique is illustrative and that other techniques and methodologies may be utilized depending on the requirements of a particular implementation. Raw depth sensor data 702 is input into a 3D (three-dimensional) pose estimate of the sensor (block 704). Sensor pose tracking can be achieved, for example, using ICP (iterative closest point) alignment between the predicted surface and current sensor measurement. Each depth measurement of the sensor can be integrated (block 706) into a volumetric representation using, for example, surfaces encoded as a signed distance field (SDF). Using a loop, the SDF is raycast (block 708) into the estimated frame to provide a dense surface prediction to which the depth map is aligned. Thus, when the user 102 looks around the virtual world, surface reconstruction data associated with the real world environment 200 can be collected and analyzed. One use of the surface reconstruction data may be to determine the user's head position and orientation.

However, Scavezze fails to specifically teach the features of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time.  
	However, this is well known in the art as evidenced by Brimijoin.  Similar to the primary reference, Brimijoin discloses determining a position based on subtracting a latency (same field of endeavor or reasonably pertinent to the problem).     
Brimijoin discloses extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time (e.g. in order to determine an angle of a head, the system discloses determining a latency.  The system further subtracts the latency from a time stamp to identify a first-time stamp to retrieve a first head angle, which is taught in col. 11, ll. 11-30.).

(40) For example, the angle error detector 128 can determine the angle error using a difference between a head angle of a current point in time (e.g., a second head angle) and a previous head angle (e.g., a first head angle) used to generate the audio signal to be outputted at the current point in time. The angle error detector 128 can determine the angle error by comparing the head angles, such as by subtracting one of the head angles from the other (e.g., subtract the first head angle from the second head angle). The angle error detector 128 can use the head angle detector 120 to identify the first head angle using a time stamp (e.g., first time stamp) assigned to the first head angle, such as by causing the head angle detector 120 to determine a latency between when the first head angle was provided to the audio signal generator 124 and when the resulting audio signal is received to be outputted at the current point in time (e.g., subtract the latency from a second time stamp corresponding to the current point in time to identify the first time stamp, and retrieve the head angle corresponding to the first time stamp to use as the first head angle).

(41) In some implementations, the head angle detector 120 predicts what the head angle is expected to be at the current point in time (e.g., a future point in time at which the audio signal generated using the head angle is expected to be outputted), in order to provide the head angle to the angle error detector 128 for the angle error detector 128 to determine the head angle error. For example, the head angle detector 120 can include a head angle model. The head angle model can be any function, filter, algorithm, or machine learning model (e.g., neural network, regression function, classifier) that receives inputs and outputs predicted head angles responsive to the inputs. The head angle model can include a Kalman filter. The head angle model can receive various inputs, including but not limited to one or more previous head angles, information regarding measured movement of the head (e.g., position, velocity, or acceleration information received from sensors 104), information regarding expected movements of the head (which may be received or determined based on information from simulation generator 144), information regarding latency or other indications of time that will have passed since the audio signal was generated, distributions of expected head angles or movements of the head given a starting head angle (e.g., a histogram or function indicating a likelihood of the head angle being a particular head angle or range of head angles, such as given the starting head angle), or various combinations thereof. As an example, the head angle model can use the first head angle and a rate of angular velocity of the HWD to predict the current head angle.

(42) For example, the head angle detector 120 can determine a predicted head angle (e.g., third head angle) indicating where the HWD is expected to be at the current point in time (which may be a future point in time relative to when the head angle detector 120 predicts the predicted head angle) using (1) the first head angle that is provided to the audio signal generator 124 to generate the audio signal for output at the current point in time and (2) a time difference between when the first head angle is measured or provided to the head angle detector 120 and an expected time at which the audio signal is to be outputted (e.g., between when the first head angle is measured and the current point in time); this time difference can be used by the head angle detector 120 in determining how much the head angle is expected to change given the time difference. The head angle detector 120 can provide the first head angle to the head angle model to determine the predicted head angle.

Therefore, in view of Brimijoin, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time, incorporated in the device of Scavezze, in order to determine the latency for pose estimation and subtracting the determined latency, which can aid in reducing the perceptual impact of latencies (as stated in Brimijoin col. 4, ll. 35-53).   

Re claim 14: Scavezze discloses the method defined in claim 12, further comprising: 
before determining the position for the representation of the image, extracting the first pose of the electronic device from the buffer (e.g. a pose is estimated that is determined from the receipt of raw data in order to determine the position for representation of the depth within maps, which is taught in ¶ [32] above.).  
	However, Scavezze fails to specifically teach the features of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time.
However, this is well known in the art as evidenced by Brimijoin.  Similar to the primary reference, Brimijoin discloses determining a position based on subtracting a latency (same field of endeavor or reasonably pertinent to the problem).     
Brimijoin discloses extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time (e.g. in order to determine an angle of a head, the system discloses determining a latency.  The system further subtracts the latency from a time stamp to identify a first time stamp to retrieve a first head angle, which is taught in col. 11, ll. 11-30 above.).

Therefore, in view of Brimijoin, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time, incorporated in the device of Scavezze, in order to determine the latency for pose estimation and subtracting the determined latency, which can aid in reducing the perceptual impact of latencies (as stated in Brimijoin col. 4, ll. 35-53).   

Re claim 22: Scavezze discloses the non-transitory computer-readable storage medium defined in claim 20, wherein the instructions further comprise instructions for: before determining the position for the representation of the image, extracting the first pose of the electronic device from the buffer (e.g. a pose is estimated that is determined from the receipt of raw data in order to determine the position for representation of the depth within maps, which is taught in ¶ [32] above.).  
	However, Scavezze fails to specifically teach the features of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time. 
However, Scavezze fails to specifically teach the features of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time.
However, this is well known in the art as evidenced by Brimijoin.  Similar to the primary reference, Brimijoin discloses determining a position based on subtracting a latency (same field of endeavor or reasonably pertinent to the problem).     
Brimijoin discloses extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time (e.g. in order to determine an angle of a head, the system discloses determining a latency.  The system further subtracts the latency from a time stamp to identify a first time stamp to retrieve a first head angle, which is taught in col. 11, ll. 11-30 above.).

Therefore, in view of Brimijoin, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of extracting the first pose of the electronic device from the buffer using a stored latency magnitude, wherein extracting the first pose of the electronic device from the buffer using the stored latency magnitude comprises: subtracting the stored latency magnitude from a current time, incorporated in the device of Scavezze, in order to determine the latency for pose estimation and subtracting the determined latency, which can aid in reducing the perceptual impact of latencies (as stated in Brimijoin col. 4, ll. 35-53).   

Claim(s) 8, 16 and 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Scavezze, as modified by Pekelny, as applied to claims 1, 9 and 17 above, and further in view of Lalonde (US Pub 2017/0287215).

Re claim 8: Scavezze discloses the electronic device defined in claim 1, wherein presenting the view of the 3D environment comprises displaying the representation of the image as part of a camera application within the 3D environment, wherein displaying the representation of the image as part of the camera application comprises displaying camera control user interface elements in addition to the representation of the image (e.g. the invention discloses a user interface that is used to guide a user in relation to an object that can be displayed as a 3D mesh representation.  The HMD can display virtual controls that can be used to operate controllable aspects of the device, which is taught in ¶ [31] and [37].).  

[0031] The sensor package can also support gaze tracking 620 in some implementations to ascertain a direction of the user's gaze 625 which may be used along with the head position and orientation data. The HMD device 104 may further be configured to expose a user interface (UI) 630 that can display system messages, prompts, and the like as well as expose controls that the user may manipulate. The controls can be virtual or physical in some cases. The UI 630 may also be configured to operate with sensed gestures and voice using, for example, voice commands or natural language.

[0037] FIG. 8 shows an illustrative example in which the user 102 operates the HMD device 104 to capture a plurality of images of an object 802 in the real world environment 200 of the user 102. The image data can be captured by the sensor package 505, for example, using image sensors 306 and be used as the basis to construct a 3D mesh representation of the object 802 for incorporation and use in the virtual environment 100. Moreover, the HMD device 104 may guide or direct the user 102 how to move in relation to the object 802 in order to capture better input images, for example, through the user interface 630.

However, Scavezze fails to specifically teach the features of and wherein the camera control user interface elements are displayed at a location determined based on the second pose. 
However, this is well known in the art as evidenced by Lalonde.  Similar to the primary reference, Lalonde discloses an HMD showing controls to be used in the system (same field of endeavor or reasonably pertinent to the problem).     
Lalonde discloses and wherein the camera control user interface elements are displayed at a location determined based on the second pose (e.g. the invention discloses shifting user controls in a certain view based on the gaze of the user and may remove the input controls if the gaze is away from a certain region, which is taught in ¶ [40].).

[0040] As shown in FIG. 1, an example of portioning a display region 112 may include adapting a sliver or thin user interface near the top of the display 108 to fit a GUI with image content depicting video (i.e., camera feed) that captures a physical keyboard sitting on a desk below the eye-line of the user. Here, the image content including keyboard image/video 116 may be displayed in display region 112 on the display 106 in response to user 102 looking downward toward a position associated with a location of the physical keyboard. If the user looks away from the display region 112, the systems and methods may remove region 112 from view on the display 106 and begin to display virtual content in place of the content shown in region 112.

Therefore, in view of Lalonde, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of and wherein the camera control user interface elements are displayed at a location determined based on the second pose, incorporated in the device of Scavezze, in order to move a user interface in a manner associated with a user gaze out of the way of the virtual environment, which ensures the other content is viewed without detracting from the immersive virtual experience (as stated in Lalonde ¶ [31]).   

Re claim 16: Scavezze discloses the method defined in claim 9, wherein presenting the view of the 3D environment comprises displaying the representation of the image as part of a camera application within the 3D environment, wherein displaying the representation of the image as part of the camera application comprises displaying camera control user interface elements in addition to the representation of the image, (e.g. the invention discloses a user interface that is used to guide a user in relation to an object that can be displayed as a 3D mesh representation.  The HMD can display virtual controls that can be used to operate controllable aspects of the device, which is taught in ¶ [31] and [37] above.).  

However, Scavezze fails to specifically teach the features of and wherein the camera control user interface elements are displayed at a location determined based on the second pose. 
However, this is well known in the art as evidenced by Lalonde.  Similar to the primary reference, Lalonde discloses an HMD showing controls to be used in the system (same field of endeavor or reasonably pertinent to the problem).     
Lalonde discloses and wherein the camera control user interface elements are displayed at a location determined based on the second pose (e.g. the invention discloses shifting user controls in a certain view based on the gaze of the user and may remove the input controls if the gaze is away from a certain region, which is taught in ¶ [40] above.).

Therefore, in view of Lalonde, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of and wherein the camera control user interface elements are displayed at a location determined based on the second pose, incorporated in the device of Scavezze, in order to move a user interface in a manner associated with a user gaze out of the way of the virtual environment, which ensures the other content is viewed without detracting from the immersive virtual experience (as stated in Lalonde ¶ [31]).   

Re claim 24: Scavezze discloses the non-transitory computer-readable storage medium defined in claim 17, wherein presenting the view of the 3D environment comprises displaying the representation of the image as part of a camera application within the 3D environment, wherein displaying the representation of the image as part of the camera application comprises displaying camera control user interface elements in addition to the representation of the image (e.g. the invention discloses a user interface that is used to guide a user in relation to an object that can be displayed as a 3D mesh representation.  The HMD can display virtual controls that can be used to operate controllable aspects of the device, which is taught in ¶ [31] and [37] above.).  
However, Scavezze fails to specifically teach the features of and wherein the camera control user interface elements are displayed at a location determined based on the second pose. 
However, this is well known in the art as evidenced by Lalonde.  Similar to the primary reference, Lalonde discloses an HMD showing controls to be used in the system (same field of endeavor or reasonably pertinent to the problem).     
Lalonde discloses and wherein the camera control user interface elements are displayed at a location determined based on the second pose (e.g. the invention discloses shifting user controls in a certain view based on the gaze of the user and may remove the input controls if the gaze is away from a certain region, which is taught in ¶ [40] above.).
Therefore, in view of Lalonde, it would have been obvious to one of ordinary skill at the time the invention was made to have the feature of and wherein the camera control user interface elements are displayed at a location determined based on the second pose, incorporated in the device of Scavezze, in order to move a user interface in a manner associated with a user gaze out of the way of the virtual environment, which ensures the other content is viewed without detracting from the immersive virtual experience (as stated in Lalonde ¶ [31]).   

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Burns discloses gaze-based placement within a virtual environment.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAD S DICKERSON whose telephone number is (571)270-1351. The examiner can normally be reached Monday-Friday 10AM-6PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abderrahim Merouan can be reached at 571-270-5254. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/CHAD DICKERSON/           Primary Examiner, Art Unit 2682
Read full office action
Prosecution Timeline

Jun 21, 2023
Application Filed
Oct 02, 2025
Non-Final Rejection mailed — §103
Dec 18, 2025
Response Filed
Apr 08, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/177,878
Patent 12641187
IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE MEDIUM
3y 2m to grant Granted May 26, 2026
18/355,594
Patent 12639850
OBJECT POSE ESTIMATION IN THE CONTEXT OF NEURAL NETWORKS
2y 10m to grant Granted May 26, 2026
17/939,170
Patent 12620085
IMAGING SYSTEM AND METHOD TO ESTIMATE CONTOUR OF A SCANNED OBJECT
3y 8m to grant Granted May 05, 2026
17/683,166
Patent 12612046
VEHICLE DRIVABLE AREA DETECTION SYSTEM
4y 2m to grant Granted Apr 28, 2026
18/186,809
Patent 12614257
SYSTEMS AND METHODS FOR CLUTTER ARTIFACT REMOVAL USING A DATA-DRIVEN MODEL TRAINED ON IMAGES FORMED USING A LOWER CLUTTER IMAGE SEQUENCE AND A HIGHER CLUTTER IMAGE SEQUENCE GENERATED BY AN ARTIFACT OVERLAY ON THE LOWER CLUTTER IMAGE SEQUENCE
3y 1m to grant Granted Apr 28, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
63%
Grant Probability
86%
With Interview (+23.0%)
3y 2m (~3m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 600 resolved cases by this examiner. Grant probability derived from career allowance rate.