DETAILED ACTION
Notice of Pre-AIA or AIA Status
Claims 1-20 are pending in this application. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
Drawings
The drawings are objected to because suitable legends are required for understanding of Figures 1A-1E and 2A-B, in accordance with 37 CFR 1.84(o). Corrected drawing sheets in compliance with 37 CFR 1.121 (d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as "amended." If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either "Replacement Sheet" or "New Sheet" pursuant to 37 CFR 1.121 (d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1 and 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lemay et al. (US PGPub US US2021/0286502A1, hereby referred to as “Lemay”).
Consider Claims 1 and 15.
Lemay teaches:
1. (currently amended) A device, comprising: one or more processors configured to: / 15. (currently amended) A system, comprising: a device; and a camera device, wherein the camera device comprises one or more cameras configured to capture an immersive image; wherein the device comprises one or more processors configured to: (Lemay: abstract, [0027]-[0035], Figures 7A-7Q and Figure 8-11, [0047]-[0048], [0048] In some embodiments, as shown in FIG. 1, the CGR experience is provided to the user via an operating environment 100 that includes a computing system 101. The computing system 101 includes a controller 110 (e.g., processors of a portable electronic device or a remote server), one or more display generation components 120 (e.g., one or more head-mounted devices (HMD), an HMD with an inner display and an outer display, one or more displays, one or more projectors, one or more touch-screens, etc., enclosed in the same housing and facing different directions, or enclosed in separate housings), one or more input devices 125 (e.g., an eye tracking device 130, a hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., speakers 160, tactile output generators 170, and other output devices 180), one or more sensors 190 (e.g., image sensors, light sensors, depth sensors, tactile sensors, orientation sensors, proximity sensors, temperature sensors, location sensors, motion sensors, velocity sensors, etc.), and optionally one or more peripheral devices 195 (e.g., home appliances, wearable devices, etc.). In some embodiments, one or more of the input devices 125, output devices 155, sensors 190, and peripheral devices 195 are integrated with the display generation component 120 (e.g., in a head-mounted device (e.g., on the housing of the HMD or an outward facing display of the HMD) or a handheld device).)
1. receive an image from a camera device, the image being an immersive image or a preview of the immersive image; / 15. receive an image from the camera device, the image being the immersive image or a preview of the immersive image, (Lemay: [0060] According to some embodiments, at least one of the display generation components 120 provides a CGR experience to the user while the user is virtually and/or physically present within the scene 105. [0061] In some embodiments, the display generation component(s) are worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, at least one of the display generation component(s) 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, at least one of the display generation component(s) 120 encloses the field-of-view of the user. In some embodiments, at least one of the display generation component(s) 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. [0076] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component(s) 120 were not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.)
1. detect one or more objects within the image; / 15. detect one or more objects within the image, (Lemay: [0057] Augmented virtuality: An augmented virtuality (AV) environment refers to a simulated environment in which a virtual or computer-generated environment incorporates one or more sensory inputs from the physical environment. The sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but people with faces photorealistically reproduced from images taken of physical people. As another example, a virtual object may adopt a shape or color of a physical article imaged by one or more imaging sensors. As a further example, a virtual object may adopt shadows consistent with the position of the sun in the physical environment. [0095], [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance.)
1. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures; / 15. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures, (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance.)
1. and in the case that it is determined that the quality of the immersive image can be improved using at least one measure of the one or more predefined measures, / 15. and in the case that it is determined that the quality of the immersive image can be improved using at least one measure of the one or more predefined measures, (Lemay: [0171] In some embodiments, depending on whether the first display generation component is being worn by the first user (e.g., whether the HMD is strapped or buckled onto the user's head and can remain in front of the user's eyes without the support of the user's hand(s), or merely being held in front of the user's eyes by the user's hand(s) and will fall away without the support of the user's hand(s)) when the first display generation component is placed into the preset configuration relative to the first user (e.g., the display side of the first display generation component is facing toward the user's eyes or face, and/or the within a threshold distance from the user's face, etc.), the computing system optionally displays different types of user interfaces (e.g., a system user interface (e.g., an application launching user interface, a home screen, a multitasking user interface, a configuration user interface, etc.) vs. an application user interface (e.g., a camera user interface, an infra-red scanner user interface (e.g., showing a heat map of the current physical environment), an augmented reality measuring application (e.g., automatically displaying measurements of physical objects in a camera view), etc.)) using the first display generation component. In some embodiments, the computing system takes a photo or video of the physical environment captured within the camera view in response to a user input detected via an input device disposed on the housing of the first display generation component (e.g., a touch sensor, a contact intensity sensor, a button, a switch, etc.), when the computing system is displaying the application user interface using the first display generation component. [0243] In some embodiments, while displaying the computer-generated environment via the first display generation component and displaying the status information corresponding to the computing system via the second display generation component, the computing system detects a fifth respective event that is triggered by a third user (e.g., movement, presence, gesture, etc. of the third user) who is in a position to view the status information displayed via the second display generation component. In response to detecting the fifth respective event (e.g., movement toward or away from the user and/or the first display generation component, presence of the third user in the same room as the user of the computing system, a gesture of the third user, etc.), in accordance with a determination that the fifth respective event meets fourth criteria (e.g., the fourth criteria provides a threshold measure of likelihood that interaction between the user of the computing system and the third user is to occur), wherein the fourth criteria require that a preset measure of interaction has increased from below a preset threshold to above the preset threshold as a result of the fifth respective event and that the computer-generated environment is displayed with a third level of immersion (e.g., virtual reality mode) in order for the fourth criteria to be met: the computing system changes a level of immersion of the computer-generated environment displayed via the first display generation component from the third level of immersion (e.g., a virtual reality mode) to a second level of immersion (e.g., changing to a less immersive mode (e.g., a mixed reality mode), or changing to a temporary pass-through mode with the virtual content continues to progress), wherein the computer-generated environment displayed with the second level of immersion includes an increased amount of representation of the physical environment than the computer-generated environment displayed with the third level of immersion (e.g., a representation of the third user is displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion, and the representation of the third user is not displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion).)
1. provide control instructions to one or more output devices to control the one or more output devices to output information / 15. provide control instructions to one or more output devices to control the one or more output devices to output information (Lemay: [0235] In some embodiments, while displaying the computer-generated environment via the first display generation component (e.g., while the computer-generated environment is provided with a second level of immersion (e.g., mixed reality mode, or temporary pass-through mode provided during virtual reality mode) or a third level of immersion (e.g., virtual reality mode)), the computing system detects a third user request to activate a parental control mode of the computing device (e.g., detecting a user input that is a fingerprint input on the housing of the HMD, or detecting a user input activating a control in the computer-generated environment that corresponds to a request to activate a parental control mode (e.g., a system request that applies to all applications subsequently displayed on the first display generation component, or an application-specific request that applies to the currently displayed application), or when the user is accessing content that is marked as “controlled” in the computer-generated environment, or when a remote request is received from a controlling device (e.g., a mobile device of the parent), etc.). The parental-control mode requires that the one or more graphical elements (e.g., the overlay that is generated based on and reflects the content of the computer-generated environment currently shown via the first display generation component) displayed via the second display generation component have more than a third threshold visibility (e.g., more than a threshold resolution, brightness, opacity, and/or clarity; less than a threshold amount of blurring; or showing identical content as that shown on the first display generation component). In response to detecting the third user request, in accordance with a determination that the third user request is received while visibility of the one or more graphical elements that provide a visual indication of content in the computer-generated environment is less than the third threshold visibility corresponding to the parental control mode, the computing system increases the visibility of the one or more graphical elements on the second display generation component above the third threshold visibility corresponding to the parental-control mode (e.g., increasing fidelity and resolution of the one or more graphical elements that provide the visual indication of the content in the computer-generated environment; or displaying the content of the computer-generated environment in place of the one or more graphical elements). In some embodiments, in response to the third user request, in accordance with a determination that the third user request is received while visibility of the one or more graphical elements that provide a visual indication of content in the computer-generated environment already exceeds the third threshold visibility corresponding to the parental-control mode, the computing system maintains the visibility of the one or more graphical elements above the third threshold visibility corresponding to the parental-control mode. In some embodiments, while the parental-control mode is active, the content of the computer-generated environment are displayed on both the first display generation component and the second display generation component, and continue to change on both the first display generation component and the second display generation component. In some embodiments, the parental control mode is enabled (e.g., either before the computer-generated experience is started or while the computer-generated experience is being displayed) by a person (e.g., a parent, teacher, supervisor, administrator, etc.) other than the user in the position to view the content displayed via the first display generation component. The parental control mode allows parents, teaches, supervisors, administrators to monitor the activities occurring on the first display generation component (e.g., the inner display of an HMD) when the display side of the first display generation component is faced away and/or blocked by the physical hardware of the first display generation component and the content is not visible to the outside viewers. Increasing the visibility of the one or more graphical elements above a third threshold visibility corresponding to the parental-control mode in accordance with a determination that the third user request to activate the parental control mode is received while visibility of the one or more graphical elements is less than the third threshold visibility corresponding to the parental control mode, reduces the number of inputs needed to increase the visibility of the one or more graphical elements above the third threshold visibility (e.g., the user does not need to perform a separate input to activate the parental control mode and a separate input to increase the visibility of the one or more graphical elements). Reducing the number of inputs needed to perform an operation enhances the operability of the device, which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently. [0236])
1. informing a user that the at least one measure can improve the quality when capturing an immersive image./ 15. informing a user that the at least one measure can improve the quality when capturing an immersive image. (Lemay: [0242] In some embodiments, displaying the one or more graphical elements that provide a visual indication of content in the computer-generated environment includes: detecting first movement (e.g., movement toward or away from the user and/or the first display generation component) of a second user who is in a position to view of the status information displayed via the second display generation component relative to the second display generation component; and in response to detecting the first movement of the second user relative to the second display generation component, in accordance with a determination that a distance between the second user and the second display generation component has decreased from above a first threshold distance to below the first threshold distance, updating display of the one or more graphical elements to increase an information density of the visual indication of content in the computer-generated environment that is provided by the one or more graphical elements. In some embodiments, in response to detecting the first movement of the second user relative to the second display generation component: in accordance with a determination that a distance between the second user and the second display generation component has increased from above the first threshold distance to below the first threshold distance, updating display of the one or more graphical elements to decrease the information density of the visual indication of content in the computer-generated environment that is provided by the one or more graphical elements. In some embodiments, three of more levels of information densities are provided by the one or more graphical elements for two or more threshold distances. In some embodiments, when the movement of the second user relative to the second display generation component does not cause the distance between the second user and the second display generation component to cross a respective distance threshold, the information density of the visual indication is not changed as a result of the movement of the second user. In some embodiments, information density is determined based on the number of indicator objects present in the overlay, and a reduction of the number of indicator objects corresponds to a reduction of information density. In some embodiments, the information density is determined based on the amount of information details (e.g., details of graphical features, amount of textual characters per unit display area, etc.) provided by the one or more graphical elements, and a reduction of the amount of information details corresponds to a reduction of information density. In some embodiments, information density is determined based on clarity and resolution of the one or more graphical elements, and a reduction of the clarity and resolution of the one or more graphical elements corresponds to a reduction of information density. Updating display of the one or more graphical elements to increase an information density of the visual indication of content provided by the one or more graphical elements, in response to detecting the first movement of the second user relative to the second display generation component, reduces the number of inputs needed to comfortably display the one or more graphical elements (e.g., the user does not need to perform additional inputs to adjust the information density of the visual indication of content when the second user moves relative to the second display generation component). Reducing the number of inputs needed to perform an operation enhances the operability of the device, which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently. [0243] In some embodiments, while displaying the computer-generated environment via the first display generation component and displaying the status information corresponding to the computing system via the second display generation component, the computing system detects a fifth respective event that is triggered by a third user (e.g., movement, presence, gesture, etc. of the third user) who is in a position to view the status information displayed via the second display generation component. In response to detecting the fifth respective event (e.g., movement toward or away from the user and/or the first display generation component, presence of the third user in the same room as the user of the computing system, a gesture of the third user, etc.), in accordance with a determination that the fifth respective event meets fourth criteria (e.g., the fourth criteria provides a threshold measure of likelihood that interaction between the user of the computing system and the third user is to occur), wherein the fourth criteria require that a preset measure of interaction has increased from below a preset threshold to above the preset threshold as a result of the fifth respective event and that the computer-generated environment is displayed with a third level of immersion (e.g., virtual reality mode) in order for the fourth criteria to be met: the computing system changes a level of immersion of the computer-generated environment displayed via the first display generation component from the third level of immersion (e.g., a virtual reality mode) to a second level of immersion (e.g., changing to a less immersive mode (e.g., a mixed reality mode), or changing to a temporary pass-through mode with the virtual content continues to progress), wherein the computer-generated environment displayed with the second level of immersion includes an increased amount of representation of the physical environment than the computer-generated environment displayed with the third level of immersion (e.g., a representation of the third user is displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion, and the representation of the third user is not displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion). Changing a level of immersion from the third level of immersion to a second level of immersion that includes an increased amount of representation of the physical environment as compared to the third level of immersion, in response to detecting the fifth respective event and in accordance with a determination that the fifth respective event meets fourth criteria requiring that a preset measure of interaction has increased from below a preset threshold to above the preset threshold as a result of the fifth respective event and that the computer-generated environment is displayed with a third level of immersion, changes the level of immersion when a set of conditions has been met without requiring further user input (e.g., further user input to change the level of immersion, further user input to increase the amount of representations of the physical environment, etc.). Performing an operation when a set of conditions has been met without requiring further user input enhances the operability of the device, which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently.)
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the invention was made.
Claims 1-3, 7-16 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lemay et al. (US PGPub US US2021/0286502A1), hereby referred to as “Lemay”, in view of Wagner et al. (US Patent 2024/0103615), hereby referred to as “Wagner”.
Consider Claims 1 and 15.
Lemay teaches:
1. (currently amended) A device, comprising: one or more processors configured to: / 15. (currently amended) A system, comprising: a device; and a camera device, wherein the camera device comprises one or more cameras configured to capture an immersive image; wherein the device comprises one or more processors configured to: (Lemay: abstract, [0027]-[0035], Figures 7A-7Q and Figure 8-11, [0047]-[0048], [0048] In some embodiments, as shown in FIG. 1, the CGR experience is provided to the user via an operating environment 100 that includes a computing system 101. The computing system 101 includes a controller 110 (e.g., processors of a portable electronic device or a remote server), one or more display generation components 120 (e.g., one or more head-mounted devices (HMD), an HMD with an inner display and an outer display, one or more displays, one or more projectors, one or more touch-screens, etc., enclosed in the same housing and facing different directions, or enclosed in separate housings), one or more input devices 125 (e.g., an eye tracking device 130, a hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., speakers 160, tactile output generators 170, and other output devices 180), one or more sensors 190 (e.g., image sensors, light sensors, depth sensors, tactile sensors, orientation sensors, proximity sensors, temperature sensors, location sensors, motion sensors, velocity sensors, etc.), and optionally one or more peripheral devices 195 (e.g., home appliances, wearable devices, etc.). In some embodiments, one or more of the input devices 125, output devices 155, sensors 190, and peripheral devices 195 are integrated with the display generation component 120 (e.g., in a head-mounted device (e.g., on the housing of the HMD or an outward facing display of the HMD) or a handheld device).)
1. receive an image from a camera device, the image being an immersive image or a preview of the immersive image; / 15. receive an image from the camera device, the image being the immersive image or a preview of the immersive image, (Lemay: [0060] According to some embodiments, at least one of the display generation components 120 provides a CGR experience to the user while the user is virtually and/or physically present within the scene 105. [0061] In some embodiments, the display generation component(s) are worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, at least one of the display generation component(s) 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, at least one of the display generation component(s) 120 encloses the field-of-view of the user. In some embodiments, at least one of the display generation component(s) 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. [0076] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component(s) 120 were not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.)
1. detect one or more objects within the image; / 15. detect one or more objects within the image, (Lemay: [0057] Augmented virtuality: An augmented virtuality (AV) environment refers to a simulated environment in which a virtual or computer-generated environment incorporates one or more sensory inputs from the physical environment. The sensory inputs may be representations of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but people with faces photorealistically reproduced from images taken of physical people. As another example, a virtual object may adopt a shape or color of a physical article imaged by one or more imaging sensors. As a further example, a virtual object may adopt shadows consistent with the position of the sun in the physical environment. [0095], [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance.)
1. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures; / 15. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures, (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance.)
1. and in the case that it is determined that the quality of the immersive image can be improved using at least one measure of the one or more predefined measures, / 15. and in the case that it is determined that the quality of the immersive image can be improved using at least one measure of the one or more predefined measures, (Lemay: [0171] In some embodiments, depending on whether the first display generation component is being worn by the first user (e.g., whether the HMD is strapped or buckled onto the user's head and can remain in front of the user's eyes without the support of the user's hand(s), or merely being held in front of the user's eyes by the user's hand(s) and will fall away without the support of the user's hand(s)) when the first display generation component is placed into the preset configuration relative to the first user (e.g., the display side of the first display generation component is facing toward the user's eyes or face, and/or the within a threshold distance from the user's face, etc.), the computing system optionally displays different types of user interfaces (e.g., a system user interface (e.g., an application launching user interface, a home screen, a multitasking user interface, a configuration user interface, etc.) vs. an application user interface (e.g., a camera user interface, an infra-red scanner user interface (e.g., showing a heat map of the current physical environment), an augmented reality measuring application (e.g., automatically displaying measurements of physical objects in a camera view), etc.)) using the first display generation component. In some embodiments, the computing system takes a photo or video of the physical environment captured within the camera view in response to a user input detected via an input device disposed on the housing of the first display generation component (e.g., a touch sensor, a contact intensity sensor, a button, a switch, etc.), when the computing system is displaying the application user interface using the first display generation component. [0243] In some embodiments, while displaying the computer-generated environment via the first display generation component and displaying the status information corresponding to the computing system via the second display generation component, the computing system detects a fifth respective event that is triggered by a third user (e.g., movement, presence, gesture, etc. of the third user) who is in a position to view the status information displayed via the second display generation component. In response to detecting the fifth respective event (e.g., movement toward or away from the user and/or the first display generation component, presence of the third user in the same room as the user of the computing system, a gesture of the third user, etc.), in accordance with a determination that the fifth respective event meets fourth criteria (e.g., the fourth criteria provides a threshold measure of likelihood that interaction between the user of the computing system and the third user is to occur), wherein the fourth criteria require that a preset measure of interaction has increased from below a preset threshold to above the preset threshold as a result of the fifth respective event and that the computer-generated environment is displayed with a third level of immersion (e.g., virtual reality mode) in order for the fourth criteria to be met: the computing system changes a level of immersion of the computer-generated environment displayed via the first display generation component from the third level of immersion (e.g., a virtual reality mode) to a second level of immersion (e.g., changing to a less immersive mode (e.g., a mixed reality mode), or changing to a temporary pass-through mode with the virtual content continues to progress), wherein the computer-generated environment displayed with the second level of immersion includes an increased amount of representation of the physical environment than the computer-generated environment displayed with the third level of immersion (e.g., a representation of the third user is displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion, and the representation of the third user is not displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion).)
1. provide control instructions to one or more output devices to control the one or more output devices to output information / 15. provide control instructions to one or more output devices to control the one or more output devices to output information (Lemay: [0235] In some embodiments, while displaying the computer-generated environment via the first display generation component (e.g., while the computer-generated environment is provided with a second level of immersion (e.g., mixed reality mode, or temporary pass-through mode provided during virtual reality mode) or a third level of immersion (e.g., virtual reality mode)), the computing system detects a third user request to activate a parental control mode of the computing device (e.g., detecting a user input that is a fingerprint input on the housing of the HMD, or detecting a user input activating a control in the computer-generated environment that corresponds to a request to activate a parental control mode (e.g., a system request that applies to all applications subsequently displayed on the first display generation component, or an application-specific request that applies to the currently displayed application), or when the user is accessing content that is marked as “controlled” in the computer-generated environment, or when a remote request is received from a controlling device (e.g., a mobile device of the parent), etc.). The parental-control mode requires that the one or more graphical elements (e.g., the overlay that is generated based on and reflects the content of the computer-generated environment currently shown via the first display generation component) displayed via the second display generation component have more than a third threshold visibility (e.g., more than a threshold resolution, brightness, opacity, and/or clarity; less than a threshold amount of blurring; or showing identical content as that shown on the first display generation component). In response to detecting the third user request, in accordance with a determination that the third user request is received while visibility of the one or more graphical elements that provide a visual indication of content in the computer-generated environment is less than the third threshold visibility corresponding to the parental control mode, the computing system increases the visibility of the one or more graphical elements on the second display generation component above the third threshold visibility corresponding to the parental-control mode (e.g., increasing fidelity and resolution of the one or more graphical elements that provide the visual indication of the content in the computer-generated environment; or displaying the content of the computer-generated environment in place of the one or more graphical elements). In some embodiments, in response to the third user request, in accordance with a determination that the third user request is received while visibility of the one or more graphical elements that provide a visual indication of content in the computer-generated environment already exceeds the third threshold visibility corresponding to the parental-control mode, the computing system maintains the visibility of the one or more graphical elements above the third threshold visibility corresponding to the parental-control mode. In some embodiments, while the parental-control mode is active, the content of the computer-generated environment are displayed on both the first display generation component and the second display generation component, and continue to change on both the first display generation component and the second display generation component. In some embodiments, the parental control mode is enabled (e.g., either before the computer-generated experience is started or while the computer-generated experience is being displayed) by a person (e.g., a parent, teacher, supervisor, administrator, etc.) other than the user in the position to view the content displayed via the first display generation component. The parental control mode allows parents, teaches, supervisors, administrators to monitor the activities occurring on the first display generation component (e.g., the inner display of an HMD) when the display side of the first display generation component is faced away and/or blocked by the physical hardware of the first display generation component and the content is not visible to the outside viewers. Increasing the visibility of the one or more graphical elements above a third threshold visibility corresponding to the parental-control mode in accordance with a determination that the third user request to activate the parental control mode is received while visibility of the one or more graphical elements is less than the third threshold visibility corresponding to the parental control mode, reduces the number of inputs needed to increase the visibility of the one or more graphical elements above the third threshold visibility (e.g., the user does not need to perform a separate input to activate the parental control mode and a separate input to increase the visibility of the one or more graphical elements). Reducing the number of inputs needed to perform an operation enhances the operability of the device, which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently. [0236])
1. informing a user that the at least one measure can improve the quality when capturing an immersive image./ 15. informing a user that the at least one measure can improve the quality when capturing an immersive image. (Lemay: [0242] In some embodiments, displaying the one or more graphical elements that provide a visual indication of content in the computer-generated environment includes: detecting first movement (e.g., movement toward or away from the user and/or the first display generation component) of a second user who is in a position to view of the status information displayed via the second display generation component relative to the second display generation component; and in response to detecting the first movement of the second user relative to the second display generation component, in accordance with a determination that a distance between the second user and the second display generation component has decreased from above a first threshold distance to below the first threshold distance, updating display of the one or more graphical elements to increase an information density of the visual indication of content in the computer-generated environment that is provided by the one or more graphical elements. In some embodiments, in response to detecting the first movement of the second user relative to the second display generation component: in accordance with a determination that a distance between the second user and the second display generation component has increased from above the first threshold distance to below the first threshold distance, updating display of the one or more graphical elements to decrease the information density of the visual indication of content in the computer-generated environment that is provided by the one or more graphical elements. In some embodiments, three of more levels of information densities are provided by the one or more graphical elements for two or more threshold distances. In some embodiments, when the movement of the second user relative to the second display generation component does not cause the distance between the second user and the second display generation component to cross a respective distance threshold, the information density of the visual indication is not changed as a result of the movement of the second user. In some embodiments, information density is determined based on the number of indicator objects present in the overlay, and a reduction of the number of indicator objects corresponds to a reduction of information density. In some embodiments, the information density is determined based on the amount of information details (e.g., details of graphical features, amount of textual characters per unit display area, etc.) provided by the one or more graphical elements, and a reduction of the amount of information details corresponds to a reduction of information density. In some embodiments, information density is determined based on clarity and resolution of the one or more graphical elements, and a reduction of the clarity and resolution of the one or more graphical elements corresponds to a reduction of information density. Updating display of the one or more graphical elements to increase an information density of the visual indication of content provided by the one or more graphical elements, in response to detecting the first movement of the second user relative to the second display generation component, reduces the number of inputs needed to comfortably display the one or more graphical elements (e.g., the user does not need to perform additional inputs to adjust the information density of the visual indication of content when the second user moves relative to the second display generation component). Reducing the number of inputs needed to perform an operation enhances the operability of the device, which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently. [0243] In some embodiments, while displaying the computer-generated environment via the first display generation component and displaying the status information corresponding to the computing system via the second display generation component, the computing system detects a fifth respective event that is triggered by a third user (e.g., movement, presence, gesture, etc. of the third user) who is in a position to view the status information displayed via the second display generation component. In response to detecting the fifth respective event (e.g., movement toward or away from the user and/or the first display generation component, presence of the third user in the same room as the user of the computing system, a gesture of the third user, etc.), in accordance with a determination that the fifth respective event meets fourth criteria (e.g., the fourth criteria provides a threshold measure of likelihood that interaction between the user of the computing system and the third user is to occur), wherein the fourth criteria require that a preset measure of interaction has increased from below a preset threshold to above the preset threshold as a result of the fifth respective event and that the computer-generated environment is displayed with a third level of immersion (e.g., virtual reality mode) in order for the fourth criteria to be met: the computing system changes a level of immersion of the computer-generated environment displayed via the first display generation component from the third level of immersion (e.g., a virtual reality mode) to a second level of immersion (e.g., changing to a less immersive mode (e.g., a mixed reality mode), or changing to a temporary pass-through mode with the virtual content continues to progress), wherein the computer-generated environment displayed with the second level of immersion includes an increased amount of representation of the physical environment than the computer-generated environment displayed with the third level of immersion (e.g., a representation of the third user is displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion, and the representation of the third user is not displayed via the first display generation component in the computer-generated environment displayed with the second level of immersion). Changing a level of immersion from the third level of immersion to a second level of immersion that includes an increased amount of representation of the physical environment as compared to the third level of immersion, in response to detecting the fifth respective event and in accordance with a determination that the fifth respective event meets fourth criteria requiring that a preset measure of interaction has increased from below a preset threshold to above the preset threshold as a result of the fifth respective event and that the computer-generated environment is displayed with a third level of immersion, changes the level of immersion when a set of conditions has been met without requiring further user input (e.g., further user input to change the level of immersion, further user input to increase the amount of representations of the physical environment, etc.). Performing an operation when a set of conditions has been met without requiring further user input enhances the operability of the device, which, additionally, reduces power usage and improves battery life of the device by enabling the user to use the device more quickly and efficiently.)
Even if Lemay does not teach:
1. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures; / 15. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures,
Wagner teaches:
1. (currently amended) A device, comprising: one or more processors configured to: / 15. (currently amended) A system, comprising: a device; and a camera device, wherein the camera device comprises one or more cameras configured to capture an immersive image; wherein the device comprises one or more processors configured to: (Wagner: abstract, A computer system displays one or more graphical elements that represent a status associated with a user, including: at a first level of immersion, displaying the one or more graphical elements with a first appearance that is based on a first set of one or more visual properties and the status associated with the user; and at a second level of immersion, displaying the one or more graphical elements with a second appearance that is based on a second set of one or more visual properties and the status associated with the user. In response to detecting that first criteria for changing the current level of immersion are met, changing display of one or more virtual elements from the first level of immersion to the second level of immersion and changing display of the one or more graphical elements with the first appearance to the second appearance. [0062]-[0067], [0063] In some embodiments, the controller 110 is configured to manage and coordinate a XR experience for the user. In some embodiments, the controller 110 includes a suitable combination of software, firmware, and/or hardware. The controller 110 is described in greater detail below with respect to FIG. 2 . In some embodiments, the controller 110 is a computing device that is local or remote relative to the scene 105 (e.g., a physical environment). For example, the controller 110 is a local server located within the scene 105. In another example, the controller 110 is a remote server located outside of the scene 105 (e.g., a cloud server, central server, or another type of server). In some embodiments, the controller 110 is communicatively coupled with the display generation component 120 (e.g., an HMD, a display, a projector, a touch-screen, or another type of display generation component) via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, or another type of communication channel). In another example, the controller 110 is included within the enclosure (e.g., a physical housing) of the display generation component 120 (e.g., an HMD, or a portable electronic device that includes a display and one or more processors), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or share the same physical enclosure or support structure with one or more of the above. [0064] In some embodiments, the display generation component 120 is configured to provide the XR experience (e.g., at least a visual component of the XR experience) to the user. In some embodiments, the display generation component 120 includes a suitable combination of software, firmware, and/or hardware. The display generation component 120 is described in greater detail below with respect to FIG. 3 . In some embodiments, the functionalities of the controller 110 are provided by and/or combined with the display generation component 120.)
1. receive an image from a camera device, the image being an immersive image or a preview of the immersive image; / 15. receive an image from the camera device, the image being the immersive image or a preview of the immersive image, (Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user.)
1. detect one or more objects within the image; / 15. detect one or more objects within the image, (Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user.)
1. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures; / 15. determine, based on the detected one or more objects, whether a quality of the immersive image can be improved using at least one measure of one or more predefined measures, (Wagner: [0116] Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in FIG. 1L can be included, either alone or in any combination, in any of the other examples of devices, features, components, and parts shown in FIGS. 1I-1K and described herein. Likewise, any of the features, components, and/or parts, including the arrangements and configurations thereof shown and described with reference to FIGS. 1I-1K can be included, either alone or in any combination, in the example of the devices, features, components, and parts shown in FIG. 1L. [0117] FIG. 1M illustrates a rear perspective view of an inter-pupillary distance (IPD) adjustment system 11.1.1-102 including first and second optical modules 11.1.1-104 a-b slidably engaging/coupled to respective guide-rods 11.1.1-108 a-b and motors 11.1.1-110 a-b of left and right adjustment subsystems 11.1.1-106 a-b. The IPD adjustment system 11.1.1-102 can be coupled to a bracket 11.1.1-112 and include a button 11.1.1-114 in electrical communication with the motors 11.1.1-110 a-b. In at least one example, the button 11.1.1-114 can electrically communicate with the first and second motors 11.1.1-110 a-b via a processor or other circuitry components to cause the first and second motors 11.1.1-110 a-b to activate and cause the first and second optical modules 11.1.1-104 a-b, respectively, to change position relative to one another.)
1. provide control instructions to one or more output devices to control the one or more output devices to output information / 15. provide control instructions to one or more output devices to control the one or more output devices to output information (Wagner: [0117] FIG. 1M illustrates a rear perspective view of an inter-pupillary distance (IPD) adjustment system 11.1.1-102 including first and second optical modules 11.1.1-104 a-b slidably engaging/coupled to respective guide-rods 11.1.1-108 a-b and motors 11.1.1-110 a-b of left and right adjustment subsystems 11.1.1-106 a-b. The IPD adjustment system 11.1.1-102 can be coupled to a bracket 11.1.1-112 and include a button 11.1.1-114 in electrical communication with the motors 11.1.1-110 a-b. In at least one example, the button 11.1.1-114 can electrically communicate with the first and second motors 11.1.1-110 a-b via a processor or other circuitry components to cause the first and second motors 11.1.1-110 a-b to activate and cause the first and second optical modules 11.1.1-104 a-b, respectively, to change position relative to one another. [0118] In at least one example, the first and second optical modules 11.1.1-104 a-b can include respective display screens configured to project light toward the user's eyes when donning the HMD 11.1.1-100. In at least one example, the user can manipulate (e.g., depress and/or rotate) the button 11.1.1-114 to activate a positional adjustment of the optical modules 11.1.1-104 a-b to match the inter-pupillary distance of the user's eyes. The optical modules 11.1.1-104 a-b can also include one or more cameras or other sensors/sensor systems for imaging and measuring the IPD of the user such that the optical modules 11.1.1-104 a-b can be adjusted to match the IPD.)
1. informing a user that the at least one measure can improve the quality when capturing an immersive image./ 15. informing a user that the at least one measure can improve the quality when capturing an immersive image. (Wagner: [0119] In one example, the user can manipulate the button 11.1.1-114 to cause an automatic positional adjustment of the first and second optical modules 11.1.1-104 a-b. In one example, the user can manipulate the button 11.1.1-114 to cause a manual adjustment such that the optical modules 11.1.1-104 a-b move further or closer away, for example when the user rotates the button 11.1.1-114 one way or the other, until the user visually matches her/his own IPD. In one example, the manual adjustment is electronically communicated via one or more circuits and power for the movements of the optical modules 11.1.1-104 a-b via the motors 11.1.1-110 a-b is provided by an electrical power source. In one example, the adjustment and movement of the optical modules 11.1.1-104 a-b via a manipulation of the button 11.1.1-114 is mechanically actuated via the movement of the button 11.1.1-114. [0120] Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in FIG. 1M can be included, either alone or in any combination, in any of the other examples of devices, features, components, and parts shown in any other figures shown and described herein. Likewise, any of the features, components, and/or parts, including the arrangements and configurations thereof shown and described with reference to any other figure shown and described herein, either alone or in any combination, in the example of the devices, features, components, and parts shown in FIG. 1M. [0121] FIG. 1N illustrates a front perspective view of a portion of an HMD 11.1.2-100, including an outer structural frame 11.1.2-102 and an inner or intermediate structural frame 11.1.2-104 defining first and second apertures 11.1.2-106 a, 11.1.2-106 b. The apertures 11.1.2-106 a-b are shown in dotted lines in FIG. 1N because a view of the apertures 11.1.2-106 a-b can be blocked by one or more other components of the HMD 11.1.2-100 coupled to the inner frame 11.1.2-104 and/or the outer frame 11.1.2-102, as shown. In at least one example, the HMD 11.1.2-100 can include a first mounting bracket 11.1.2-108 coupled to the inner frame 11.1.2-104. In at least one example, the mounting bracket 11.1.2-108 is coupled to the inner frame 11.1.2-104 between the first and second apertures 11.1.2-106 a-b.)
It would have been obvious before the effective filing date of the claimed invention was filed to one of ordinary skill in the art to modify Lemay with the teachings of Wagner as they are both directed towards immersive image representations of VR data incorporating in data using multi-sensor data acquisition. The determination of obviousness is predicated upon the following findings: Both references are directed to the same field of endeavor, and One skilled in the art would have been motivated to modify Lemay in order to improve the overall accuracy and end-user interaction in an immersive imaging experience. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Lemay, while the teaching of Wagner continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of making a more efficient and intuitive end-user experience for immersive imaging. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.
Consider Claim 14.
Lemay teaches:
14. (currently amended) A method for user-specifically presenting an immersive image, the method comprising: (Lemay: abstract, [0027]-[0035], Figures 7A-7Q and Figure 8-11, [0047]-[0048], [0048] In some embodiments, as shown in FIG. 1, the CGR experience is provided to the user via an operating environment 100 that includes a computing system 101. The computing system 101 includes a controller 110 (e.g., processors of a portable electronic device or a remote server), one or more display generation components 120 (e.g., one or more head-mounted devices (HMD), an HMD with an inner display and an outer display, one or more displays, one or more projectors, one or more touch-screens, etc., enclosed in the same housing and facing different directions, or enclosed in separate housings), one or more input devices 125 (e.g., an eye tracking device 130, a hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., speakers 160, tactile output generators 170, and other output devices 180), one or more sensors 190 (e.g., image sensors, light sensors, depth sensors, tactile sensors, orientation sensors, proximity sensors, temperature sensors, location sensors, motion sensors, velocity sensors, etc.), and optionally one or more peripheral devices 195 (e.g., home appliances, wearable devices, etc.). In some embodiments, one or more of the input devices 125, output devices 155, sensors 190, and peripheral devices 195 are integrated with the display generation component 120 (e.g., in a head-mounted device (e.g., on the housing of the HMD or an outward facing display of the HMD) or a handheld device).)
14. capturing a first stereoscopic immersive image using at least a first camera and a second camera, wherein the first stereoscopic immersive image is associated with a first distance between the first camera and the second camera; (Lemay: [0061] In some embodiments, the display generation component(s) are worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, at least one of the display generation component(s) 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, at least one of the display generation component(s) 120 encloses the field-of-view of the user. In some embodiments, at least one of the display generation component(s) 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. [0076] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component(s) 120 were not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like. [0088] It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the hand tracking device 440 may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.[0101] In some embodiments, the eye tracking device is part of a head-mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lense(s) 520), eye tracking cameras (e.g., eye tracking camera(s) 540), and light sources (e.g., light sources 530 (e.g., IR or NIR LEDs), mounted in a wearable housing. The Light sources emit light (e.g., IR or NIR light) towards the user's eye(s) 592. In some embodiments, the light sources may be arranged in rings or circles around each of the lenses as shown in FIG. 5. In some embodiments, eight light sources 530 (e.g., LEDs) are arranged around each lens 520 as an example. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used. [0102] In some embodiments, the display 510 emits light in the visible light range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the location and angle of eye tracking camera(s) 540 is given by way of example, and is not intended to be limiting. In some embodiments, a single eye tracking camera 540 located on each side of the user's face. In some embodiments, two or more NIR cameras 540 may be used on each side of the user's face. In some embodiments, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some embodiments, a camera 540 that operates at one wavelength (e.g. 850 nm) and a camera 540 that operates at a different wavelength (e.g. 940 nm) may be used on each side of the user's face.)
14. capturing a second stereoscopic immersive image using at least the first camera and the second camera, wherein the second stereoscopic immersive image is associated with a second distance between the first camera and the second camera different from the first distance; (Lemay: [0101] In some embodiments, the eye tracking device is part of a head-mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lense(s) 520), eye tracking cameras (e.g., eye tracking camera(s) 540), and light sources (e.g., light sources 530 (e.g., IR or NIR LEDs), mounted in a wearable housing. The Light sources emit light (e.g., IR or NIR light) towards the user's eye(s) 592. In some embodiments, the light sources may be arranged in rings or circles around each of the lenses as shown in FIG. 5. In some embodiments, eight light sources 530 (e.g., LEDs) are arranged around each lens 520 as an example. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used. [0102] In some embodiments, the display 510 emits light in the visible light range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the location and angle of eye tracking camera(s) 540 is given by way of example, and is not intended to be limiting. In some embodiments, a single eye tracking camera 540 located on each side of the user's face. In some embodiments, two or more NIR cameras 540 may be used on each side of the user's face. In some embodiments, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some embodiments, a camera 540 that operates at one wavelength (e.g. 850 nm) and a camera 540 that operates at a different wavelength (e.g. 940 nm) may be used on each side of the user's face.)
14. determining an interpupillary metric of a user; (Lemay: [0104] FIG. 6 illustrates a glint-assisted gaze tracking pipeline, in accordance with some embodiments. In some embodiments, the gaze tracking pipeline is implemented by a glint-assisted gaze tracing system (e.g., eye tracking device 130 as illustrated in FIGS. 1 and 5). The glint-assisted gaze tracking system may maintain a tracking state. Initially, the tracking state is off or “NO”. When in the tracking state, the glint-assisted gaze tracking system uses prior information from the previous frame when analyzing the current frame to track the pupil contour and glints in the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect the pupil and glints in the current frame and, if successful, initializes the tracking state to “YES” and continues with the next frame in the tracking state. [0105], [0106])
14. in the case that the first metric is closer to the interpupillary metric of the user, presenting the first stereoscopic immersive image to the user; and (Examiner Note: Lemay does teach the use of the pupil and glint in order to ascertain whether a tracking state is reached and to determine the accuracy of tracking Lemay: [0106] At 610, for the current captured images, if the tracking state is YES, then the method proceeds to element 640. At 610, if the tracking state is NO, then as indicated at 620 the images are analyzed to detect the user's pupils and glints in the images. At 630, if the pupils and glints are successfully detected, then the method proceeds to element 640. Otherwise, the method returns to element 610 to process next images of the user's eyes. [0107] At 640, if proceeding from element 410, the current frames are analyzed to track the pupils and glints based in part on prior information from the previous frames. At 640, if proceeding from element 630, the tracking state is initialized based on the detected pupils and glints in the current frames. Results of processing at element 640 are checked to verify that the results of tracking or detection can be trusted. For example, results may be checked to determine if the pupil and a sufficient number of glints to perform gaze estimation are successfully tracked or detected in the current frames. At 650, if the results cannot be trusted, then the tracking state is set to NO and the method returns to element 610 to process next images of the user's eyes. At 650, if the results are trusted, then the method proceeds to element 670. At 670, the tracking state is set to YES (if not already YES), and the pupil and glint information is passed to element 680 to estimate the user's point of gaze.)
14. in the case that the second distance is closer to the interpupillary distance of the user, presenting the second stereoscopic immersive image to the user. (Lemay: [0069] In some embodiments, the coordination unit 246 is configured to manage and coordinate the CGR experience presented to the user by at least one of the display generation component(s) 120, and optionally, by one or more of the output devices 155 and/or peripheral devices 195. To that end, in various embodiments, the coordination unit 246 includes instructions and/or logic therefor, and heuristics and metadata therefore. [0070] In some embodiments, the data transmitting unit 248 is configured to transmit data (e.g., presentation data, location data, etc.) to at least one or more of the display generation component(s) 120, and optionally, to one or more of the input devices 125, output devices 155, sensors 190, and/or peripheral devices 195. To that end, in various embodiments, the data transmitting unit 248 includes instructions and/or logic therefor, and heuristics and metadata therefore.)
Lemay does not teach:
14. determining an interpupillary distance of a user;
14. determining, whether the first distance or the second distance is closer to the interpupillary distance of the user;
14. in the case that the first distance is closer to the interpupillary distance of the user, presenting the first stereoscopic immersive image to the user; and in the case that the second distance is closer to the interpupillary distance of the user, presenting the second stereoscopic immersive image to the user.
Wagner teaches:
14. (currently amended) A method for user-specifically presenting an immersive image, the method comprising: (Wagner: abstract, A computer system displays one or more graphical elements that represent a status associated with a user, including: at a first level of immersion, displaying the one or more graphical elements with a first appearance that is based on a first set of one or more visual properties and the status associated with the user; and at a second level of immersion, displaying the one or more graphical elements with a second appearance that is based on a second set of one or more visual properties and the status associated with the user. In response to detecting that first criteria for changing the current level of immersion are met, changing display of one or more virtual elements from the first level of immersion to the second level of immersion and changing display of the one or more graphical elements with the first appearance to the second appearance. [0062]-[0067], [0063] In some embodiments, the controller 110 is configured to manage and coordinate a XR experience for the user. In some embodiments, the controller 110 includes a suitable combination of software, firmware, and/or hardware. The controller 110 is described in greater detail below with respect to FIG. 2 . In some embodiments, the controller 110 is a computing device that is local or remote relative to the scene 105 (e.g., a physical environment). For example, the controller 110 is a local server located within the scene 105. In another example, the controller 110 is a remote server located outside of the scene 105 (e.g., a cloud server, central server, or another type of server). In some embodiments, the controller 110 is communicatively coupled with the display generation component 120 (e.g., an HMD, a display, a projector, a touch-screen, or another type of display generation component) via one or more wired or wireless communication channels 144 (e.g., BLUETOOTH, IEEE 802.11x, IEEE 802.16x, IEEE 802.3x, or another type of communication channel). In another example, the controller 110 is included within the enclosure (e.g., a physical housing) of the display generation component 120 (e.g., an HMD, or a portable electronic device that includes a display and one or more processors), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or share the same physical enclosure or support structure with one or more of the above. [0064] In some embodiments, the display generation component 120 is configured to provide the XR experience (e.g., at least a visual component of the XR experience) to the user. In some embodiments, the display generation component 120 includes a suitable combination of software, firmware, and/or hardware. The display generation component 120 is described in greater detail below with respect to FIG. 3 . In some embodiments, the functionalities of the controller 110 are provided by and/or combined with the display generation component 120.)
14. capturing a first stereoscopic immersive image using at least a first camera and a second camera, wherein the first stereoscopic immersive image is associated with a first distance between the first camera and the second camera; (Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user.)
14. capturing a second stereoscopic immersive image using at least the first camera and the second camera, wherein the second stereoscopic immersive image is associated with a second distance between the first camera and the second camera different from the first distance; (Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user.)
14. determining an interpupillary distance of a user; (Wagner: [0116] Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in FIG. 1L can be included, either alone or in any combination, in any of the other examples of devices, features, components, and parts shown in FIGS. 1I-1K and described herein. Likewise, any of the features, components, and/or parts, including the arrangements and configurations thereof shown and described with reference to FIGS. 1I-1K can be included, either alone or in any combination, in the example of the devices, features, components, and parts shown in FIG. 1L. [0117] FIG. 1M illustrates a rear perspective view of an inter-pupillary distance (IPD) adjustment system 11.1.1-102 including first and second optical modules 11.1.1-104 a-b slidably engaging/coupled to respective guide-rods 11.1.1-108 a-b and motors 11.1.1-110 a-b of left and right adjustment subsystems 11.1.1-106 a-b. The IPD adjustment system 11.1.1-102 can be coupled to a bracket 11.1.1-112 and include a button 11.1.1-114 in electrical communication with the motors 11.1.1-110 a-b. In at least one example, the button 11.1.1-114 can electrically communicate with the first and second motors 11.1.1-110 a-b via a processor or other circuitry components to cause the first and second motors 11.1.1-110 a-b to activate and cause the first and second optical modules 11.1.1-104 a-b, respectively, to change position relative to one another.)
14. determining, whether the first distance or the second distance is closer to the interpupillary distance of the user; (Wagner: [0117] FIG. 1M illustrates a rear perspective view of an inter-pupillary distance (IPD) adjustment system 11.1.1-102 including first and second optical modules 11.1.1-104 a-b slidably engaging/coupled to respective guide-rods 11.1.1-108 a-b and motors 11.1.1-110 a-b of left and right adjustment subsystems 11.1.1-106 a-b. The IPD adjustment system 11.1.1-102 can be coupled to a bracket 11.1.1-112 and include a button 11.1.1-114 in electrical communication with the motors 11.1.1-110 a-b. In at least one example, the button 11.1.1-114 can electrically communicate with the first and second motors 11.1.1-110 a-b via a processor or other circuitry components to cause the first and second motors 11.1.1-110 a-b to activate and cause the first and second optical modules 11.1.1-104 a-b, respectively, to change position relative to one another. [0118] In at least one example, the first and second optical modules 11.1.1-104 a-b can include respective display screens configured to project light toward the user's eyes when donning the HMD 11.1.1-100. In at least one example, the user can manipulate (e.g., depress and/or rotate) the button 11.1.1-114 to activate a positional adjustment of the optical modules 11.1.1-104 a-b to match the inter-pupillary distance of the user's eyes. The optical modules 11.1.1-104 a-b can also include one or more cameras or other sensors/sensor systems for imaging and measuring the IPD of the user such that the optical modules 11.1.1-104 a-b can be adjusted to match the IPD.)
14. in the case that the first distance is closer to the interpupillary distance of the user, presenting the first stereoscopic immersive image to the user; and in the case that the second distance is closer to the interpupillary distance of the user, presenting the second stereoscopic immersive image to the user. (Wagner: [0119] In one example, the user can manipulate the button 11.1.1-114 to cause an automatic positional adjustment of the first and second optical modules 11.1.1-104 a-b. In one example, the user can manipulate the button 11.1.1-114 to cause a manual adjustment such that the optical modules 11.1.1-104 a-b move further or closer away, for example when the user rotates the button 11.1.1-114 one way or the other, until the user visually matches her/his own IPD. In one example, the manual adjustment is electronically communicated via one or more circuits and power for the movements of the optical modules 11.1.1-104 a-b via the motors 11.1.1-110 a-b is provided by an electrical power source. In one example, the adjustment and movement of the optical modules 11.1.1-104 a-b via a manipulation of the button 11.1.1-114 is mechanically actuated via the movement of the button 11.1.1-114. [0120] Any of the features, components, and/or parts, including the arrangements and configurations thereof shown in FIG. 1M can be included, either alone or in any combination, in any of the other examples of devices, features, components, and parts shown in any other figures shown and described herein. Likewise, any of the features, components, and/or parts, including the arrangements and configurations thereof shown and described with reference to any other figure shown and described herein, either alone or in any combination, in the example of the devices, features, components, and parts shown in FIG. 1M. [0121] FIG. 1N illustrates a front perspective view of a portion of an HMD 11.1.2-100, including an outer structural frame 11.1.2-102 and an inner or intermediate structural frame 11.1.2-104 defining first and second apertures 11.1.2-106 a, 11.1.2-106 b. The apertures 11.1.2-106 a-b are shown in dotted lines in FIG. 1N because a view of the apertures 11.1.2-106 a-b can be blocked by one or more other components of the HMD 11.1.2-100 coupled to the inner frame 11.1.2-104 and/or the outer frame 11.1.2-102, as shown. In at least one example, the HMD 11.1.2-100 can include a first mounting bracket 11.1.2-108 coupled to the inner frame 11.1.2-104. In at least one example, the mounting bracket 11.1.2-108 is coupled to the inner frame 11.1.2-104 between the first and second apertures 11.1.2-106 a-b.)
It would have been obvious before the effective filing date of the claimed invention was filed to one of ordinary skill in the art to modify Lemay with the teachings of Wagner as they are both directed towards immersive image representations of VR data incorporating in data using multi-sensor data acquisition. The determination of obviousness is predicated upon the following findings: Both references are directed to the same field of endeavor, and One skilled in the art would have been motivated to modify Lemay in order to improve the overall accuracy and end-user interaction in an immersive imaging experience. Furthermore, the prior art collectively includes each element claimed (though not all in the same reference), and one of ordinary skill in the art could have combined the elements in the manner explained above using known engineering design, interface and/or programming techniques, without changing a “fundamental” operating principle of Lemay, while the teaching of Wagner continues to perform the same function as originally taught prior to being combined, in order to produce the repeatable and predictable result of making a more efficient and intuitive end-user experience for immersive imaging. It is for at least the aforementioned reasons that the examiner has reached a conclusion of obviousness with respect to the claim in question.
Consider Claim 2.
The combination of Lemay and Wagner teaches:
2. (currently amended) The device according to claim 1 wherein the image is a stereoscopic image comprising a first image associated with a first lens of the camera device and a second image associated with a second lens of the camera device; and wherein the one or more processors configured to: detect one or more first objects within the first image and one or more second objects within the second image; determine, by comparing the one or more first objects with the one or more second objects, whether the first image or the second image shows at least one object not shown in the other one; in the case that it is determined that the first image or the second image shows at least one object not shown in the other one, determine that the quality of the immersive image can be improved; determine a type of the at least one object; and determine, based on the type of the at least one object, the at least one measure of one or more predefined measures with which the quality of the immersive image can be improved. (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance. Wagner: [0161] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors. [0162] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and fingertips. [0484]-[0487])
Consider Claim 3.
The combination of Lemay and Wagner teaches:
3. (currently amended) The device according to claim 2 wherein the one or more processors are configured to: determine a depth map using the stereoscopic image, the depth map comprising depth information regarding the one or more objects detected within the image; determine, whether the depth map comprises an erroneous depth for the at least one object; and in the case that it is determined that the depth map comprises an erroneous depth for the at least one object, determine the first image or the second image shows at least one object not shown in the other one. (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance. Wagner: [0161] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors. [0162] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and fingertips. [0484]-[0487])
Consider Claim 7.
The combination of Lemay and Wagner teaches:
7. (currently amended) The device according to claim 2 wherein the one or more processors are configured to: determine, whether the at least one object is star-shaped or comprises one or more circular rings; in the case that it is determined that the at least one object is star-shaped or comprises one or more circular rings, determine a lens flare as the type of the at least one object. (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance. Wagner: [0161] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors. [0162] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and fingertips. [0484]-[0487])
Consider Claim 8.
The combination of Lemay and Wagner teaches:
8. (currently amended) The device according to claim 2 wherein the one or more processors are configured to: receive a plurality of further stereoscopic images from the camera device, each further stereoscopic image of the plurality of further stereoscopic images comprising a first image associated with the first lens of the camera device and a second image associated with the second lens of the camera device, wherein each stereoscopic image of the plurality of further stereoscopic images is associated with a respective position and/or rotation of the camera device, and wherein the position and/or the rotation varies among stereoscopic image and the plurality of further stereoscopic images; for each further stereoscopic image of the plurality of further stereoscopic images, detect one or more first objects within the first image and one or more second objects within the second image; determine, whether the first image of the stereoscopic image and one or more first images of the plurality of further stereoscopic images shows at least one same object at a substantially same position; and in the case that it is determined that the first image of the stereoscopic image and one or more first images of the plurality of further stereoscopic images shows at least one same object at a substantially same position, determine a lens defect of the first lens as the type of the at least one object. (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance. Wagner: [0161] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors. [0162] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and fingertips. [0219] FIG. 7B follows FIG. 7A and illustrates that at a later time, the XR content 7002 has progressed further on the first display generation component 7100, and the appearance of the portion of the body of the first user 7202 has changed. For example, the change in appearance of the portion of the body of the first user 7202 is due to movement of at least a portion of the portion of the body of the first user 7202 (e.g., the eyes, nose, brows, mouth, or other body features of the portion of the body of the first user) relative to the first display generation component 7100 (e.g., the movement includes sideway and/or up and down movements of the first user's eye balls, squinting, opening, closing and/or blinking of the user's eyes, movement of the brows (e.g., furrowing, raising, and/or flashing), movement of the nose (e.g., wriggling, scrunching, and/or flaring), movement and/or changes of the cheeks (e.g., bulging or twitching) and/or the movement includes the movement of the user's face or head relative to the display side of the first display generation component (e.g., moving away or toward from the first display generation component 7100)) within location A 7000-a (e.g., while the first user 7202 is still wearing the HMD and/or facing the inner display of the HMD). In some embodiments, the change in appearance of the portion of the body of the first user 7202 is due to a change in color, shape, and/or other visual characteristics of the body features in the portion of the body (e.g., sweating, skin turning white, red, and/or blue, pupils dilating, tears swelling in eyes, jaws dropping, and/or facial muscles relaxing). At this time, the XR content 7002 is still displayed in the mixed reality mode and the representation of the physical environment (e.g., location B, optionally, including the second user 7204), including the representation 7010 of the second user 7202 (e.g., shown as 7010-b in FIG. 7B), remains concurrently displayed among the XR content 7002 via the first display generation component 7100. In some embodiments, changes in the appearance of the physical environment (e.g., visual changes in the body features of the second user 7204 and movement of the second user 7204 relative to the first display generation component 7100, the second display generation component 7102, and/or the first user 7202) are reflected by the representation of the physical environment shown by the first display generation component 7100, as well. [0484]-[0487])
Consider Claim 9.
The combination of Lemay and Wagner teaches:
9. (currently amended) The device according to claim 1 wherein the one or more processors are configured to: determine a depth map using the stereoscopic image, the depth map comprising depth information regarding the one or more objects detected within the image. determine, using the depth map, for each object of the one or more objects a respective distance to the camera device; determine a distance difference between a distance associated with an object of the one or more objects furthest away from the camera device and a distance associated with an object of the one or more objects closest to the camera device; determine, whether the distance difference is greater than a predefined upper distance threshold value; and in the case that it is determined that the distance difference is greater than the predefined upper distance threshold value, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by moving the camera. (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance. Wagner: [0161] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors. [0162] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and fingertips. [0232] In some embodiments, the computer system utilizes data collected from one or more biometric sensors (e.g., sensors that capture biometric features, such as irises, pupils, voiceprint, fingerprint, and/or facial features) to determine whether the identity of the user (e.g., the first user 7202 in FIGS. 7A-7B, or the third user 7206 in FIGS. 7D-7E) meets the first criteria (e.g., stored biometric information that corresponds to a primary user or enrolled user).)
Consider Claim 10.
The combination of Lemay and Wagner teaches:
10. (currently amended) The device according to claim 1 wherein the image is a spherical immersive image which comprise a first half- spherical immersive image having a Fisheye format and a second half-spherical immersive image having the Fisheye format; wherein the one or more processors are configured to: determine, whether there is at least one object of the one or more objects of which a first portion is shown in the first half-spherical immersive image and a second portion is shown in the second half-spherical immersive image; in the case that it is determined that there is at least one object of which a first portion is shown in the first half-spherical immersive image and a second portion is shown in the second half-spherical immersive image, determine, whether the at least one object is associated with a face or with written text; and in the case it is determined that the at least one object is associated with a face or with written text, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by rotating the camera device such that the at least one object is completelywithin the first half-spherical immersive image or the second half-spherical immersive image.
Consider Claim 11.
The combination of Lemay and Wagner teaches:
11. (currently amended) The device according to claim 2 wherein the one or more processors are configured to: determine the type of the at least one object using the semantic image segmentation; determine, whether at least one object of the one or more objects is associated with a predefined object of interest; in the case that it is determined that at least one object of the one or more objects is associated with a predefined object of interest, determine, whether the at least object is located in a center portion of the image; and in the case that it is determined that the at least object is not located in the center portion of the image, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by moving the camera device such that the at least one object is located in the center portion of the image. (Examiner Note: the rendering of virtual content differently based on the different directions of the user’s gaze is indicative of quality metrics that can be improved using predefined measures; Lemay: [0092] FIG. 4 further includes a schematic representation of a depth map 410 captured by the image sensors 404, in accordance with some embodiments. The depth map, as explained above, comprises a matrix of pixels having respective depth values. The pixels 412 corresponding to the hand 406 have been segmented out from the background and the wrist in this map. The brightness of each pixel within the depth map 410 corresponds inversely to its depth value, i.e., the measured z distance from the image sensors 404, with the shade of gray growing darker with increasing depth. The controller 110 processes these depth values in order to identify and segment a component of the image (i.e., a group of neighboring pixels) having characteristics of a human hand. These characteristics, may include, for example, overall size, shape and motion from frame to frame of the sequence of depth maps. [0095]-[0100], [0096] As shown in FIG. 5, in some embodiments, a gaze tracking device 130 includes at least one eye tracking camera (e.g., infrared (IR) or near-IR (NIR) cameras), and illumination sources (e.g., IR or NIR light sources such as an array or ring of LEDs) that emit light (e.g., IR or NIR light) towards the user's eyes. The eye tracking cameras may be pointed towards the user's eyes to receive reflected IR or NIR light from the light sources directly from the eyes, or alternatively may be pointed towards “hot” mirrors located between the user's eyes and the display panels that reflect IR or NIR light from the eyes to the eye tracking cameras while allowing visible light to pass. The gaze tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyze the images to generate gaze tracking information, and communicate the gaze tracking information to the controller 110. In some embodiments, two eyes of the user are separately tracked by respective eye tracking cameras and illumination sources. In some embodiments, only one eye of the user is tracked by a respective eye tracking camera and illumination sources. [0100] The following describes several possible use cases for the user's current gaze direction, and is not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content at a higher resolution in a foveal region determined from the user's current gaze direction than in peripheral regions. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in AR applications, the controller 110 may direct external cameras for capturing the physical environment of the CGR experience to focus in the determined direction. The autofocus mechanism of the external cameras may then focus on an object or surface in the environment that the user is currently looking at on the display 510. As another example use case, the eye lenses 520 may be focusable lenses, and the gaze tracking information is used by the controller to adjust the focus of the eye lenses 520 so that the virtual object that the user is currently looking at has the proper vergence to match the convergence of the user's eyes 592. The controller 110 may leverage the gaze tracking information to direct the eye lenses 520 to adjust focus so that close objects that the user is looking at appear at the right distance. Wagner: [0161] In some embodiments, the image sensors 404 project a pattern of spots onto a scene containing the hand 406 and capture an image of the projected pattern. In some embodiments, the controller 110 computes the 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation, based on transverse shifts of the spots in the pattern. This approach is advantageous in that it does not require the user to hold or wear any sort of beacon, sensor, or other marker. It gives the depth coordinates of points in the scene relative to a predetermined reference plane, at a certain distance from the image sensors 404. In the present disclosure, the image sensors 404 are assumed to define an orthogonal set of x, y, z axes, so that depth coordinates of points in the scene correspond to z components measured by the image sensors. Alternatively, the image sensors 404 (e.g., a hand tracking device) may use other methods of 3D mapping, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors. [0162] In some embodiments, the hand tracking device 140 captures and processes a temporal sequence of depth maps containing the user's hand, while the user moves his hand (e.g., whole hand or one or more fingers). Software running on a processor in the image sensors 404 and/or the controller 110 processes the 3D map data to extract patch descriptors of the hand in these depth maps. The software matches these descriptors to patch descriptors stored in a database 408, based on a prior learning process, in order to estimate the pose of the hand in each frame. The pose typically includes 3D locations of the user's hand joints and fingertips. [0232] In some embodiments, the computer system utilizes data collected from one or more biometric sensors (e.g., sensors that capture biometric features, such as irises, pupils, voiceprint, fingerprint, and/or facial features) to determine whether the identity of the user (e.g., the first user 7202 in FIGS. 7A-7B, or the third user 7206 in FIGS. 7D-7E) meets the first criteria (e.g., stored biometric information that corresponds to a primary user or enrolled user).)
Consider Claim 12.
The combination of Lemay and Wagner teaches:
12. (currently amended) The device according to claim 1 wherein the one or more processors are configured to: determine whether at least one object of the one or more objects is associated with a Moird effect; and in the case that it is determined that at least one object of the one or more objects is associated with a Moird effect, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by moving further away from or closer to the at least one object and/or by moving to change an angle to the at least one object. (Lemay: [0101] In some embodiments, the eye tracking device is part of a head-mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lense(s) 520), eye tracking cameras (e.g., eye tracking camera(s) 540), and light sources (e.g., light sources 530 (e.g., IR or NIR LEDs), mounted in a wearable housing. The Light sources emit light (e.g., IR or NIR light) towards the user's eye(s) 592. In some embodiments, the light sources may be arranged in rings or circles around each of the lenses as shown in FIG. 5. In some embodiments, eight light sources 530 (e.g., LEDs) are arranged around each lens 520 as an example. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used. [0102] In some embodiments, the display 510 emits light in the visible light range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the location and angle of eye tracking camera(s) 540 is given by way of example, and is not intended to be limiting. In some embodiments, a single eye tracking camera 540 located on each side of the user's face. In some embodiments, two or more NIR cameras 540 may be used on each side of the user's face. In some embodiments, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some embodiments, a camera 540 that operates at one wavelength (e.g. 850 nm) and a camera 540 that operates at a different wavelength (e.g. 940 nm) may be used on each side of the user's face. Wagner: [0013] In another aspect, a method is performed at a computing system including at least a first display generation component and one or more input devices. The method includes displaying, via the first display generation component, a first object. The method further includes, while displaying the first object via the first display generation component, detecting one or more movements that change a viewing angle of a first viewer relative to content that is displayed via the first display generation component. The method further includes, in response to detecting the one or more movements that change the viewing angle of the first viewer relative to the content that is displayed via the first display generation component, in accordance with a determination that the change in the viewing angle of the first viewer meets first criteria, changing a value of at least a first display parameter of the first object other than a viewing perspective of the first object. [0044], [0344] In FIG. 7W, the viewing angle of the second user 7204 relative to the representation 7006 of the first user 7202 displayed via the second display generation component 7102 is still within the preferred viewing zone (e.g., in the preferred viewing zones in both the latitudinal direction and the longitudinal direction of the second display generation component, and/or within a central region in front of the second display generation component); and as a result, the first filter 7402-a that is applied to the representation 7006-i of the first user 7202 is the same as that applied to the representation 7006-h of the first user 7202 in FIG. 7V, in accordance with some embodiments. In some embodiments, the representation 7006-i shows a different viewing perspective of the representation 7006 of first user 7202 from the viewing perspective of the representation 7006-h, due to the change in viewing angle of the second user 7204 relative to the representation 7006 of the first user 7202 shown via the second display generation component 7102. [0345] In FIG. 7X, the appearance of the first user 7202, the relative position of the first user 7202 and the first display generation component 7100, and the viewing angle of the second user 7204 for the representation 7006 of the first user 7202 on the second display generation component 7102 are the same or substantially the same as those shown in FIG. 7U. However, in FIG. 7X, the ambient lighting in the physical environment (e.g., location B 7000-b, or another physical environment in which the representation 7006 of the first user 7202 is displayed) has been changed relative to the ambient lighting in the environment shown in FIG. 7U (and relative to the ambient lighting in the environment shown in FIGS. 7V and 7W).)
Consider Claim 13.
The combination of Lemay and Wagner teaches:
13. (currently amended) The device according to claim 1 wherein the one or more processors are configured to: determine, whether at least one object of the one or more objects is associated with a tripod; and in the case that it is determined that at least one object of the one or more objects is associated with a tripod, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by changing to a setup in which no tripod is visible when capturing an immersive image. (Wagner: In some embodiments, the display generation component is worn on a part of the user's body (e.g., on his/her head, or on his/her hand). As such, the display generation component 120 includes one or more XR displays provided to display the XR content. For example, in various embodiments, the display generation component 120 encloses the field-of-view of the user. In some embodiments, the display generation component 120 is a handheld device (such as a smartphone or tablet) configured to present XR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. In some embodiments, the display generation component 120 is a XR chamber, enclosure, or room configured to present XR content in which the user does not wear or hold the display generation component 120. Many user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) could be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with XR content triggered based on interactions that happen in a space in front of a handheld or tripod mounted device could similarly be implemented with an HMD where the interactions happen in a space in front of the HMD and the responses of the XR content are displayed via the HMD. Similarly, a user interface showing interactions with XR content triggered based on movement of a handheld or tripod mounted device relative to the physical environment (e.g., the scene 105 or a part of the user's body (e.g., the user's eye(s), head, or hand)) could similarly be implemented with an HMD where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a part of the user's body (e.g., the user's eye(s), head, or hand)).)
Consider Claim 16.
The combination of Lemay and Wagner teaches:
16. (currently amended) The system according to claim 15, wherein the camera device comprises: a display device configured to display the immersive image or a preview of the immersive image; and one or more orientation sensors configured to detect an orientation of the camera device and to provide the detected orientation of the camera device to the one or more processors; wherein the one or more processors of the device are configured to: determine an offset value representing an offset of the orientation of the camera device from a horizontal orientation, determine, whether the offset value is equal to or greater than a predefined offset threshold value, and in the case that it is determined that the offset value is equal to or greater than the predefined offset threshold value, control the display device to display a water-scale representing the orientation of the camera device. (Lemay: [0060] According to some embodiments, at least one of the display generation components 120 provides a CGR experience to the user while the user is virtually and/or physically present within the scene 105. [0061] In some embodiments, the display generation component(s) are worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, at least one of the display generation component(s) 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, at least one of the display generation component(s) 120 encloses the field-of-view of the user. In some embodiments, at least one of the display generation component(s) 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. [0076] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component(s) 120 were not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like. Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user.)
Consider Claim 19.
The combination of Lemay and Wagner teaches:
19. (currently amended) The system according to Claim 15 wherein the one or more cameras are configured to capture a preview image representing a preview of the immersive image to be captured; wherein the camera device further comprises: a display device configured to display the preview image, and one or more eye-tracking cameras configured to detect eye- tracking data representing an eye-viewing direction and a focus depth of a user using the camera device; and wherein the one or more processors are configured to: determine, based on the eye-tracking data, which object within the preview image the user is looking at, and control the one or more cameras to focus on the object the user is looking at. (Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user. Lemay: [0060] According to some embodiments, at least one of the display generation components 120 provides a CGR experience to the user while the user is virtually and/or physically present within the scene 105. [0061] In some embodiments, the display generation component(s) are worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, at least one of the display generation component(s) 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, at least one of the display generation component(s) 120 encloses the field-of-view of the user. In some embodiments, at least one of the display generation component(s) 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. [0076] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component(s) 120 were not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.)
Consider Claim 20.
The combination of Lemay and Wagner teaches:
20. (currently amended) The system according to Claim 15 wherein the one or more cameras are configured to capture a preview image representing a preview of the immersive image to be captured; wherein the system further comprises a head-mounted display communicatively connected to the one or more three-dimensional cameras, wherein the head- mounted display comprises: a display device configured to display the preview image; one or more eye-tracking cameras configured to detect eye-tracking data representing an eye-viewing direction and a focus depth of a user wearing the head-mounted display; wherein the one or more processors of the device are configured to: determine, based on the eye-tracking data, which object within the preview image the user is looking at, and control the one or more cameras to focus on the object the user is looking at.(Wagner: [0068] FIGS. 1A-1P illustrate various examples of a computer system that is used to perform the methods and provide audio, visual and/or haptic feedback as part of user interfaces described herein. In some embodiments, the computer system includes one or more display generation components (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b) for displaying virtual elements and/or a representation of a physical environment to a user of the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. User interfaces generated by the computer system are optionally corrected by one or more corrective lenses 11.3.2-216 that are optionally removably attached to one or more of the optical modules to enable the user interfaces to be more easily viewed by users who would otherwise use glasses or contacts to correct their vision. While many user interfaces illustrated herein show a single view of a user interface, user interfaces in a HMD are optionally displayed using two optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b), one for a user's right eye and a different one for a user's left eye, and slightly different images are presented to the two different eyes to generate the illusion of stereoscopic depth, the single view of the user interface would typically be either a right-eye or left-eye view and the depth effect is explained in the text or using other schematic charts or views. In some embodiments, the computer system includes one or more external displays (e.g., display assembly 1-108) for displaying status information for the computer system to the user of the computer system (when the computer system is not being worn) and/or to other people who are near the computer system, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic component 1-112) for generating audio feedback, optionally generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) for detecting information about a physical environment of the device which can be used (optionally in conjunction with one or more illuminators such as the illuminators described in FIG. 1I) to generate a digital passthrough image, capture visual media corresponding to the physical environment (e.g., photos and/or video), or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment so that virtual objects ban be placed based on a detected pose of physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting hand position and/or movement (e.g., one or more sensors in sensor assembly 1-356, and/or FIG. 1I) that can be used (optionally in conjunction with one or more illuminators such as the illuminators 6-124 described in FIG. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in FIG. 1I) which can be used (optionally in conjunction with one or more lights such as lights 11.3.2-110 in FIG. 1O) to determine attention or gaze position and/or gaze movement which can optionally be used to detect gaze-only inputs based on gaze movement and/or dwell. A combination of the various sensors described above can be used to determine user facial expressions and/or hand movements for use in generating an avatar or representation of the user such as an anthropomorphic avatar or representation for use in a real-time communication session where the avatar has facial expressions, hand movements, and/or body movements that are based on or similar to detected facial expressions, hand movements, and/or body movements of a user of the device. Gaze and/or attention information is, optionally, combined with hand tracking information to determine interactions between the user and one or more user interfaces based on direct and/or indirect inputs such as air gestures or inputs that use one or more hardware input devices such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328), knobs (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328), trackpads, touch screens, keyboards, mice and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and or dial or button 1-328) are optionally used to perform system operations such as recentering content in three-dimensional environment that is visible to a user of the device, displaying a home user interface for launching applications, starting real-time communication sessions, or initiating display of virtual three-dimensional backgrounds. Knobs or digital crowns (e.g., first button 1-128 which is depressible and twistable or rotatable, button 11.1.1-114, and/or dial or button 1-328) are optionally rotatable to adjust parameters of the visual content such as a level of immersion of a virtual three-dimensional environment (e.g., a degree to which virtual-content occupies the viewport of the user into the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content that is displayed via the optical modules (e.g., first and second display assemblies 1-120 a, 1-120 b and/or first and second optical modules 11.1.1-104 a and 11.1.1-104 b). [0069] FIG. 1B illustrates a front, top, perspective view of an example of a head-mountable display (HMD) device 1-100 configured to be donned by a user and provide virtual and altered/mixed reality (VR/AR) experiences. The HMD 1-100 can include a display unit 1-102 or assembly, an electronic strap assembly 1-104 connected to and extending from the display unit 1-102, and a band assembly 1-106 secured at either end to the electronic strap assembly 1-104. The electronic strap assembly 1-104 and the band 1-106 can be part of a retention assembly configured to wrap around a user's head to hold the display unit 1-102 against the face of the user. Lemay: [0060] According to some embodiments, at least one of the display generation components 120 provides a CGR experience to the user while the user is virtually and/or physically present within the scene 105. [0061] In some embodiments, the display generation component(s) are worn on a part of the user's body (e.g., on his/her head, on his/her hand, etc.). As such, at least one of the display generation component(s) 120 includes one or more CGR displays provided to display the CGR content. For example, in various embodiments, at least one of the display generation component(s) 120 encloses the field-of-view of the user. In some embodiments, at least one of the display generation component(s) 120 is a handheld device (such as a smartphone or tablet) configured to present CGR content, and the user holds the device with a display directed towards the field-of-view of the user and a camera directed towards the scene 105. In some embodiments, the handheld device is optionally placed within an enclosure that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., a tripod) in front of the user. [0076] In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the face of the user that includes the eyes of the user (and may be referred to as an eye-tracking camera). In some embodiments, the one or more image sensors 314 are configured to obtain image data that corresponds to at least a portion of the user's hand(s) and optionally arm(s) of the user (and may be referred to as a hand-tracking camera). In some embodiments, the one or more image sensors 314 are configured to be forward-facing so as to obtain image data that corresponds to the scene as would be viewed by the user if the display generation component(s) 120 were not present (and may be referred to as a scene camera). The one or more optional image sensors 314 can include one or more RGB cameras (e.g., with a complimentary metal-oxide-semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor), one or more infrared (IR) cameras, one or more event-based cameras, and/or the like.)
Allowable Subject Matter
Claims 4-6, and 17-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Claims 4-6 are not rejected because the prior art fails to teach the method of Claims 4-6, which specifically comprises the following features in combination with other recited limitations:
-; 4. (currently amended) The device according to claim 2 wherein the one or more processors are configured to: determine, whether the at least one object has an ellipsoidal shape; in the case that it is determined that the at least one object has an ellipsoidal shape, determine, whether there is color fringing at a border of the at least one object; and in the case that it is determined that the at least one object has an ellipsoidal shape with color fringing at a border of the at least one object, determine a drop of a liquid as the type of the at least one object.
-; 5. (currently amended) The device according to claim 5 wherein the one or more processors are configured to: in the case that it is determined that the at least one object has an ellipsoidal shape but no color fringing at a border of the at least one object, determine the type of the at least one object to be a dust particle or a fingerprint.
-; 6. (currently amended) The device according to claim 4 wherein the one or more processors are configured to: in the case that it is determined that the first image shows the at least one object not shown in the second image, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by cleaning the first lens; and/or in the case that it is determined that the second image shows the at least one object not shown in the first image, determine that, as a measure of the one or more predefined measures, the quality of the immersive image can be improved by cleaning the second lens.
Claims 17 and 18 are not rejected because the prior art fails to teach the system of Claim 17 and the system of Claim 18, which specifically comprises the following features in combination with other recited limitations:
17. (currently amended) The system according to Claim 15 wherein the immersive image is a spherical immersive image; and wherein the one or more processors of the device are configured to: receive input data representing that a user provided instructions to take a spherical immersive image without the user being shown; responsive to receiving the input data: control the one or more cameras to capture a preview of the spherical immersive image, determine, whether the user is shown in the preview of the spherical immersive image, and in the case that it is determined that the user is not shown in the preview of the spherical immersive image, control the one or more cameras to capture the spherical immersive image.
18. (currently amended) The system according to Claim 15 wherein the immersive image has a Fisheye format; and wherein the one or more processors are configured to: determine, whether the captured immersive image comprises at least one object of interest; in the case that it is determined that the captured immersive image comprises at least one object of interest, determine, whether, in the case that the captured immersive image having the Fisheye format would be converted into an immersive image having an Equirectangular format, the at least one object would be in a predefined upper region or in a predefined lower region of the immersive image having the Equirectangular format; and in the case that it is determined that the at least one object would be in the predefined upper region or in the predefined lower region of the immersive image having the Equirectangular format, keep the captured immersive image in the Fisheye format or convert the captured immersive image to have a format different from the Equirectangular format.
Conclusion
The prior art made of record in form PTO-892 and not relied upon is considered pertinent to applicant's disclosure.
PNG
media_image1.png
194
896
media_image1.png
Greyscale
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TAHMINA ANSARI whose telephone number is 571-270-3379. The examiner can normally be reached on IFP Flex - Monday through Friday 9 to 5.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O’NEAL MISTRY can be reached on 313-446-4912. The fax phone numbers for the organization where this application or proceeding is assigned are 571-273-8300 for regular communications and 571-273-8300 for After Final communications. TC 2600’s customer service number is 571-272-2600.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2600.
2674
/Tahmina Ansari/
February 12, 2026
/TAHMINA N ANSARI/Primary Examiner, Art Unit 2674