Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Response to Amendment
This is in response to applicant’s amendment/response filed on 09/26/2025, which has been entered and made of record. Claims 1, 6, 10, 11, 16, 20 have been amended. Claims 1-20 are pending in the application.
Response to Arguments
Applicant's arguments filed on 09/26/2025 regarding claims rejection under 35 U.S.C 103 have been fully considered but they are not persuasive.
Applicant submits “This correlation requirement is a critical feature distinguishing the invention from prior art. Neither Smith, Miller, nor Stafford discloses or suggests correlating these three distinct categories of parameters, all captured in real time, to infer intent for controlling real-world access within an virtual environment.” (Remarks, Page 9.)
The examiner disagrees with Applicant’s premises and conclusion. Miller teaches in ¶0008, “The intent of the user to interact with the physical devices is determined on the basis of the instructions received from the user and contextual signals, for example, time of the day, previously stored historical data, previously determined intent, past user behavior, other pre-stored user intents, predictive intents corresponding to the physical devices. In this way, the user is interfaced with that particular object or device using world-locked eye gaze and eye tracking techniques without having to reference that particular device by its nickname or ID overcoming the drawbacks of the conventional methods. Further, the user can control and interact with any of the physical objects or devices dynamically in real-time in a more intuitive way without the requirement of using the object's or device's IDs or nicknames, or location data. Interfacing the user with the physical objects or devices through the 3D representation and through eye gazing techniques, enables the user to perform any kind of user command, for example, head gesture, hand gesture, voice gesture, finger taps, drag and drop movement, finger pinching, rotating movement, bloom gesture, resizing, selecting, moving, a natural language query, and any other kinds of AR-related commands.” and ¶0048, “The intent of the user is determined by determining the instructions and one or more contextual signals. The instructions may be explicit (e.g., “turn on a light”) or implicit (e.g., “where did I buy this”). For example, the implicit instruction may be based on determining the eye tracking and eye gazing process as well without referencing the physical device. The intent identification engine 130 may further determine the one or more contextual signals that may include, but are not limited to, kinds of user intents for interacting with the physical objects or devices, kinds of commands detected based on previously issued commands of the user in the past, and based on other factors such as time of the day the user has interacted before or the user is likely to interact in present time, user behavior learned from previous actions or intents, or predicting user behavior in the present time, current state and historical state of both the user and the physical objects and/or devices, historical context semantics of the user intent for interaction, weather forecast information in the present time, past time or for the future time, default settings set for each physical object or device, priority scores associated with each physical object and/or device, different zones or regions of interest the user has shown specific priority or interest to interact, number of times of each of the factors stated above, and other related factors that are specific for interacting with various physical objects and/or devices including the factors used for training and updating the machine learning models. Additionally, the intent identification engine 130 is programmed or configured to determine the intent of the user by leveraging artificial intelligence for automatically determining such intent of the user in real-time and dynamically and/or for automatically predicting such intent of the user in real-time and dynamically.” ¶0039, “each data store may be a relational, columnar, correlation, or other suitable databases.” ¶0070, “the confidence distribution score and each of the one or more contextual signals are correlated on another.” The cited paragraph clearly teaches “identifying intent of one or more users based on correlating all of the content data, historic user behavior data, and user movement data in combination”.
Applicant submits “Neither Miller nor Stafford teaches or suggests:
Simultaneous correlation of all three categories of input, content, history, and
movement, in real time;
A system architecture requiring both interior-facing cameras (eyeball movement) and exterior-facing cameras (hand gesture/motion detection); or
A system in which the correlation of these parameters governs access to and display of real-world environment views within environment.
The present application therefore goes well beyond predictive or single-signal systems by requiring multi-source parameter fusion for intent inference. This correlation step, explicitly required by amended claims 1 and 11, is absent in Miller and Stafford.” (Remarks, Page 10.)
The examiner disagrees with Applicant’s premises and conclusion. In additional to Miller described above, Stafford also teaches ¶0010, “the historic inputs from other users may be correlated with content from the VR scene to determine specific zoom factor settings that caused other users to experience dizziness, motion sickness, etc.” ¶0012, “Images captured by the forward facing cameras are analyzed to identify an object captured in the real-world environment that correlates with the gaze direction of the user.” ¶0036, “the area closer to the object is correlated with a physical environment in which the user operates his HMD. In such embodiments, a boundary of the area closer to the object to which the user has been teleported is correlated with the confines of the user's physical world environment.” ¶0092, “compute the direction of the user's gaze and correlate it to objects within the field of view of the computed direction.” ¶0141, “any movement in the physical space can be correlated with the movement of the user in virtual space.” In combination with Miller, it also teaches parameter fusion logic for intent-driven mixed- reality display transitions.
Applicant submits “However, Osman fails to disclose or suggest:
Gradient transitions that are triggered by intent inferred from multi-parameter
correlation of content, history, and movement data;
Correlation-based decision-making to enable or restrict real-world views.
As amended, claims 6 and 16 explicitly recite gradient transitions based on intent determined from correlating content data, historic user behavior data, user movement data, and user command data. Osman's gaze-only fading is fundamentally different.” (Remarks, Page 10.)
The examiner disagrees with Applicant’s premises and conclusion. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Smith and Miller teaches enabling the display of the virtual environment and the real-world environment, comprises: transitioning, by the processing unit, (Abstract, Fading is a gradient manner.).
Applicant submits “LeBeau does not disclose:
Presenter-issued commands that specifically enable or restrict real-world
environment views for attendees; or
Correlation of user command data with other parameters to infer intent. (Remarks, Page 11.)
The examiner disagrees with Applicant’s premises and conclusion. In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986). Smith and Miller teach Presenter-issued commands that specifically enable or restrict environment views for attendees; or Correlation of user command data with other parameters to infer intent (Smith, ¶0048, ¶0059, ¶0060, Miller, ¶0008, 0048, 0070). LeBeau teaches commands provided by the presenter, to selectively enable or restrict one or more real-world environment views for the attendees (LeBeau, ¶0033).
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 10 and 20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention.
As to claim 10, applicant recites “commands provided by the presenter, selectively enable or restrict one or more real-world environment views for the attendees”. The new feature is not described in the applicant’s specification. In applicant’s remark, it states the new feature can be found in paragraph 0035, 0039, 0042, 0055-0057. However, the specification did not mention what is considered as “commands provided by the presenter, selectively enable or restrict one or more real-world environment views for the attendees”. There are no specific details about “commands provided by the presenter, selectively enable or restrict one or more real-world environment views for the attendees”.
Claim 20 recited similar new matter as claim 10. Please see claim 10 for detailed analysis.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 7-9, 11-15, 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Smith (US Pub 2018/0173404 A1) in view of Miller et al. (US 2024/0411364 A1) and Stafford et al. (US Pub 2017/0358141 A1)
As to claim 1, Smith discloses a method for controlling access to virtual environment and real-world environment in an extended reality environment (Smith, abstract), the method comprising:
receiving, by a processing unit, in real-time, content data, (Smith, ¶0020, “the user can identify that a phone is only to be displayed when there is an incoming call.” ¶0020, “the detection of real world movement can trigger the display of an object associated with the movement. For example, when another person walks into the user's cubicle, the movement of the other person can be detected and the other person included as one of the real world objects in the user experience.” ¶0021, “enable a user to select real world objects to be included in a user experience.” ¶0021, “The user sees the real world while in selection mode, i.e., the cubicle as it appears in real life. The user provides input selecting one or more of the real world objects. For example, she can select items such as her desk, chair, computer, telephone, garbage can, picture of her family, etc. These selections identify those real world objects as objects that will not be replaced in the created user experience. The user input selecting the objects can identify portions of a view of the cubicle from one or more different perspectives. In one example, the user moves a finger around in space in front of the HMD to draw boundaries around particular objects. As the user draws with her finger, virtual lines appear overlaying the view of the real world.”);
identifying, by the processing unit, intent of one or more users associated with the virtual environment, to access real-world environment, based on a correlation (¶0021-0022, ¶0027, “allow a user to selectively mix his current environment with a desired environment.” ¶0032, “a live view is modified to include virtual reality content to provide a user experience based on user selection of which real world objects should (or should not) be replaced with virtual reality content.” ¶0039, “the real object selection engine 113 can control a user's selection mode experience in which the user 101 views a live view of the user's real world environment using the HDM 104 and provides finger movements or other inputs to identify the objects to make the real world selections 131.” ¶0051, “by tapping on an image of a trash can or positioning a finger over the middle of the trash can for more than a threshold amount of time, the trashcan is selected.”);
enabling, by the processing unit, display of the virtual environment and one or more selected views of the real-world environment simultaneously on display screen of the extended reality device, based on the intent, to control access to the virtual environment and one or more selected views of the real-world environment (¶0026, “the user creates and activates the user experience and then suddenly she is sitting at her desk on the beach in Maui, Hi.. She can still see her computer screen and her desk. She can answer the phone when it rings and type on her keyboard. However, when she looks at the floor, she sees sand. When she looks behind her computer, she sees the waves rolling onto the beach. As she turns around in a complete 360 she sees a beach in Maui except for where the selected real-world objects are displayed. She also hears the waves and the seagulls. She feels as if she were really working at her desk on the beach.”).
Smith does not explicitly disclose historic user behavior data and wherein the user movement data is received from one or more cameras placed on the interior surface of the extended reality device corresponding to eyeball movement of the user, and one or more cameras placed on the exterior surface of the extended reality device corresponding to at least one of: hand movement, hand gesture, and direction of motion; correlating, by the processing unit, all of the content data, historic user behavior data, and user movement data in combination.
Miller teaches historic user behavior data (Miller, ¶0008, “The intent of the user to interact with the physical devices is determined on the basis of the instructions received from the user and contextual signals, for example, time of the day, previously stored historical data, previously determined intent, past user behavior, other pre-stored user intents, predictive intents corresponding to the physical devices.” ¶0048, “The intent identification engine 130 may further determine the one or more contextual signals that may include, but are not limited to, kinds of user intents for interacting with the physical objects or devices, kinds of commands detected based on previously issued commands of the user in the past, and based on other factors such as time of the day the user has interacted before or the user is likely to interact in present time, user behavior learned from previous actions or intents, or predicting user behavior in the present time, current state and historical state of both the user and the physical objects and/or devices, historical context semantics of the user intent for interaction, weather forecast information in the present time, past time or for the future time, default settings set for each physical object or device, priority scores associated with each physical object and/or device, different zones or regions of interest the user has shown specific priority or interest to interact, number of times of each of the factors stated above, and other related factors that are specific for interacting with various physical objects and/or devices including the factors used for training and updating the machine learning models.”);
wherein the user movement data is received from one or more cameras placed on the interior surface of the extended reality device corresponding to eyeball movement of the user (Miller, ¶0055, “The HMD 102 may include an eye-tracking unit or system to track the eye gazing and vergence movement of the user wearing the HMD 102.” claim 12, “the gaze of the eye of the user is based on gaze data received from a gaze-tracking camera of the head-wearable device.”),
and one or more cameras placed on the exterior surface of the extended reality device corresponding to at least one of: hand movement, hand gesture, and direction of motion (Fig. 1A-1B, ¶0008, “Interfacing the user with the physical objects or devices through the 3D representation and through eye gazing techniques, enables the user to perform any kind of user command, for example, head gesture, hand gesture, voice gesture, finger taps, drag and drop movement, finger pinching, rotating movement, bloom gesture, resizing, selecting, moving, a natural language query, and any other kinds of AR-related commands.” ¶0027, “detect the intent of the user to interact with any of the devices or objects in the environment through one or more commands including, but not limited to, any head gesture, hand gesture, voice gesture, finger taps, drag and drop movement, finger pinching, rotating movement, bloom gesture, resizing, selecting, moving, a natural language query or commands, and any other kinds of AR-related commands to interact.” ¶0033, “The image sensors, for example, cameras may be head worn as shown in FIG. 1B. In a non-limiting embodiment, the image sensors comprising the cameras may include digital still cameras, a digital moving image, or video cameras. The image sensors are configured to capture images or live streams of the physical environment where the user is around and present. In an embodiment, the image sensors capture the images and/or live streams of the physical environment in real-time and provide the user with viewing 3D object-centric map representation of all physical objects as virtual objects or mixed reality objects on the display 108 of the AR device 102 as shown in FIG. 1A.”, ¶0055, “The HMD 102 may include one or more cameras that can capture images and videos of environments. The HMD 102 may include an eye-tracking unit or system to track the eye gazing and vergence movement of the user wearing the HMD 102.” ¶0087. ¶0089);
correlating, by the processing unit, all of the content data, historic user behavior data, and user movement data in combination, and identifying, by the processing unit, intent of one or more users associated with the virtual environment, to access real-world environment, based on the correlation (Miller, ¶0008, “The intent of the user to interact with the physical devices is determined on the basis of the instructions received from the user and contextual signals, for example, time of the day, previously stored historical data, previously determined intent, past user behavior, other pre-stored user intents, predictive intents corresponding to the physical devices. In this way, the user is interfaced with that particular object or device using world-locked eye gaze and eye tracking techniques without having to reference that particular device by its nickname or ID overcoming the drawbacks of the conventional methods. Further, the user can control and interact with any of the physical objects or devices dynamically in real-time in a more intuitive way without the requirement of using the object's or device's IDs or nicknames, or location data. Interfacing the user with the physical objects or devices through the 3D representation and through eye gazing techniques, enables the user to perform any kind of user command, for example, head gesture, hand gesture, voice gesture, finger taps, drag and drop movement, finger pinching, rotating movement, bloom gesture, resizing, selecting, moving, a natural language query, and any other kinds of AR-related commands.” and ¶0048, “The intent of the user is determined by determining the instructions and one or more contextual signals. The instructions may be explicit (e.g., “turn on a light”) or implicit (e.g., “where did I buy this”). For example, the implicit instruction may be based on determining the eye tracking and eye gazing process as well without referencing the physical device. The intent identification engine 130 may further determine the one or more contextual signals that may include, but are not limited to, kinds of user intents for interacting with the physical objects or devices, kinds of commands detected based on previously issued commands of the user in the past, and based on other factors such as time of the day the user has interacted before or the user is likely to interact in present time, user behavior learned from previous actions or intents, or predicting user behavior in the present time, current state and historical state of both the user and the physical objects and/or devices, historical context semantics of the user intent for interaction, weather forecast information in the present time, past time or for the future time, default settings set for each physical object or device, priority scores associated with each physical object and/or device, different zones or regions of interest the user has shown specific priority or interest to interact, number of times of each of the factors stated above, and other related factors that are specific for interacting with various physical objects and/or devices including the factors used for training and updating the machine learning models. Additionally, the intent identification engine 130 is programmed or configured to determine the intent of the user by leveraging artificial intelligence for automatically determining such intent of the user in real-time and dynamically and/or for automatically predicting such intent of the user in real-time and dynamically.” ¶0039, “each data store may be a relational, columnar, correlation, or other suitable databases.” ¶0070, “the confidence distribution score and each of the one or more contextual signals are correlated on another.”)
Smith and Miller are considered to be analogous art because all pertain to mixed reality. It would have been obvious before the effective filing date of the claimed invention to have modified Smith with the features of “historic user behavior data and wherein the user movement data is received from one or more cameras placed on the interior surface of the extended reality device corresponding to eyeball movement of the user, and one or more cameras placed on the exterior surface of the extended reality device corresponding to at least one of: hand movement, hand gesture, and direction of motion; correlating, by the processing unit, all of the content data, historic user behavior data, and user movement data in combination.” as taught by Miller. The claim would have been obvious because the technique for improving a particular class of devices was part of the ordinary capabilities of a person of ordinary skill in the art, in view of the teaching of the technique for improvement in other situations.
Stafford teaches historic user behavior data (Stafford, ¶0010, “historic input from other users that have viewed the images of the VR scene may be considered when generating the signal to adjust the zoom factor. For example, in some embodiments, the historic inputs from other users may be correlated with content from the VR scene to determine specific zoom factor settings that caused other users to experience dizziness, motion sickness, etc.”) and
wherein the user movement data is received from one or more cameras placed on the interior surface of the extended reality device corresponding to eyeball movement of the user, and one or more cameras placed on the exterior surface of the extended reality device corresponding to at least one of: hand movement, hand gesture, and direction of motion (Stafford, ¶0044, “A digital camera 101 of the HMD 104 captures images of the gestures provided by a user and a processor within the HMD 104 analyzes the gestures to determine whether a game displayed within the HMD 104 is affected by the gestures.” ¶0045, “the digital camera 101 is located on a face plate of the HMD 104 facing forward to capture real-world images including gestures provided by the user.” ¶0087, “the HMD 104 may include one or more internal cameras (e.g., gaze detection cameras) 103 to detect changes in the user's eyes movement, gaze direction, gaze pattern,” ¶0090, “one or more pairs of stereo camera, one or more infrared cameras and/or one or more regular camera or combinations thereof may be used to determine the relative position of the HMD and the motion of the HMD provided by user's head motion” ¶0091, “The one or more internal cameras (e.g., gaze detection cameras, etc.) may be mounted on the HMD and facing inward toward the user to capture images related to the user and feed the images to the communication module to provide user specific and environment specific data to the HMD.”);
correlating, by the processing unit, all of the content data, historic user behavior data, and user movement data in combination and identifying, by the processing unit, intent of one or more users associated with the virtual environment, to access real-world environment, based on the correlation (Stafford, ¶0010, “the historic inputs from other users may be correlated with content from the VR scene to determine specific zoom factor settings that caused other users to experience dizziness, motion sickness, etc.” ¶0012, “Images captured by the forward facing cameras are analyzed to identify an object captured in the real-world environment that correlates with the gaze direction of the user.” ¶0036, “the area closer to the object is correlated with a physical environment in which the user operates his HMD. In such embodiments, a boundary of the area closer to the object to which the user has been teleported is correlated with the confines of the user's physical world environment.” ¶0092, “compute the direction of the user's gaze and correlate it to objects within the field of view of the computed direction.” ¶0141, “any movement in the physical space can be correlated with the movement of the user in virtual space.”).
Smith, Miller and Stafford are considered to be analogous art because all pertain to mixed reality. It would have been obvious before the effective filing date of the claimed invention to have modified Smith with the features of “historic user behavior data and wherein the user movement data is received from one or more cameras placed on the interior surface of the extended reality device corresponding to eyeball movement of the user, and one or more cameras placed on the exterior surface of the extended reality device corresponding to at least one of: hand movement, hand gesture, and direction of motion; correlating, by the processing unit, all of the content data, historic user behavior data, and user movement data in combination.” as taught by Stafford. The claim would have been obvious because the technique for improving a particular class of devices was part of the ordinary capabilities of a person of ordinary skill in the art, in view of the teaching of the technique for improvement in other situations.
As to claim 2, claim 1 is incorporated and the combination of Smith, Miller and Stafford discloses correlating, by the processing unit, the one or more parameters; and identifying, by the processing unit, the intent of the user to interact with at-least one real-word object in the real-world environment, based on the correlation (Smith, ¶0021-0022, ¶0027, ¶0039, “the real object selection engine 113 can control a user's selection mode experience in which the user 101 views a live view of the user's real world environment using the HDM 104 and provides finger movements or other inputs to identify the objects to make the real world selections 131.” ¶0041, “using the object detection engine 115 and experience creation engine 116 to provide a user experience that combines selected real world objects and virtual reality content. The HMD 104 captures a live view that is provided to the object detection engine 115. The object detection engine 115 uses the 3D object models 118 to detect the real world objects corresponding to the real object selections 131 in each frame of the live view that should not be replaced. The experience creation engine 116 modifies the live view by replacing the other objects in each of the frames with virtual reality content based on the user's virtual reality content selection 132. The experience creation engine 116 then provides the modified live feed, including the virtual reality content, in real time for display to the user on user device 102A. In this way, the user experiences a user experience 133 that combines a live view of selected real world objects with the selected virtual reality content.”).
As to claim 3, claim 2 is incorporated and the combination of Smith, Miller and Stafford discloses enabling the display of the virtual environment and the real-world environment, comprises: displaying, by the processing unit, the at least one real-world object as the real-world environment in the display screen of the extended reality device (Smith, ¶0021-0022, ¶0027, ¶0039, “the real object selection engine 113 can control a user's selection mode experience in which the user 101 views a live view of the user's real world environment using the HDM 104 and provides finger movements or other inputs to identify the objects to make the real world selections 131.” ¶0041, “using the object detection engine 115 and experience creation engine 116 to provide a user experience that combines selected real world objects and virtual reality content. The HMD 104 captures a live view that is provided to the object detection engine 115. The object detection engine 115 uses the 3D object models 118 to detect the real world objects corresponding to the real object selections 131 in each frame of the live view that should not be replaced. The experience creation engine 116 modifies the live view by replacing the other objects in each of the frames with virtual reality content based on the user's virtual reality content selection 132. The experience creation engine 116 then provides the modified live feed, including the virtual reality content, in real time for display to the user on user device 102A. In this way, the user experiences a user experience 133 that combines a live view of selected real world objects with the selected virtual reality content.”).
As to claim 4, claim 3 is incorporated and the combination of Smith, Miller and Stafford discloses displaying the at least one real-world object comprises:
integrating, by the processing unit, a sensor system in the extended reality device to detect location of the at least one real-world object in the real-world environment (Smith, ¶0052, “displays sample video clips or sample live feeds from other real world locations that the user can select for inclusion in the user experience.” Fig. 4, ¶0054, “capturing live video of the real world environment using a video camera on a head-mounted display,” ¶0055, “the object detection involves the use of a 3D model of the real world environment and/or the real world objects in the environment. This allows the real world objects in the live view to be identified from different perspectives.” “a 3D model of a chair can be created based on images of the chair captured by the user viewing the chair using a HDM to view and select the chair from different viewing directions, using an HDM that has multiple cameras, or using a 3D model generation technique that determines a model of an object from a single image or viewing direction.” ¶0057, “movement of the camera can be used to identify the selected real world objects in the live feed frames. As the user, rotates the camera (e.g., by turning his head while moving an HDM) or mores the camera (e.g., by walking around the real world environment), information about the camera position is used to identify the selected real world objects. For example, the chair can be identified when the user viewing the chair, turns around so that the chair is outside of the current view, and then turns back around again so that the chair is again in the current view. In this example, the technique determines based on the rotations of the camera that the chair will be visible again in an approximate position and uses this information about the likely position of the chair to identify the chair.”);
computing, by the processing unit, set of coordinates related to the real-world object in the real-world environment (Smith, ¶0055, “identifying real world objects in individual frames of the live video that are in a subset of real world objects selected by the user, as shown in block 402. In one example, an algorithm is executed to analyze each frame of the live view to identify objects within each frame. Such an algorithm can use information about each of the objects in the subset.” ¶0056, “detecting whether the camera used to provide the live feed is moved, for example, using one or more gyroscopes and/or other sensors. If the live feed camera remains in a constant position, one technique assumes that a chair or other object remains in the same position from one frame to the next unless differences in the appearance of the frames suggest otherwise.” ¶0058, “a 3D model of a real world environment is created and the technique keeps track of the camera relative to the real world objects in a 3D model.”); and
mapping, by the processing unit, the set of coordinates with a Region of Interest (ROI) on the display screen, to provide real-time display of the at least one real-world object in the ROI (Smith, Fig. 4, ¶0056, “the portion of a live feed that correspond to the chair during the selection process are assumed to be the chair as the user enters the user experience that replaces non-selected real world objects with virtual reality content.” ¶0058, “the technique can determine that when the camera (e.g., the user with an HMD on) is in a particular location facing a particular direction, a chair, desk, and computer monitor should be visible. In contrast, the technique can determine that in another camera location and/or direction a white board mounted on the wall should be visible. The determinations can be used to automatically display portions of the live feed corresponding the locations of the expected objects. In another example, the determinations are used as a starting point or input to an algorithm that confirms the determinations by performing object detection using the image characteristics of the frames of the live feed.”).
As to claim 5, claim 4 is incorporated and the combination of Smith, Miller and Stafford discloses displaying the at least one real-world object further comprises: controlling the sensor system to enable fixed display of the at least one real-world object in the ROI, irrespective of orientation of the extended reality device (Smith, ¶0024, “a user provides input to orient a virtual environment relative to the real world objects that were selected for the user experience. For example, a user may provide a finger twisting or other input to rotate a virtual beach environment so that the ocean is directly in front of the real world desk rather than to the right side of the desk. Similarly, the user may control the relative position of real world object to aspects of the virtual environment. For example, the user could provide input to change the relative location of the real world desk and other objects to the ocean.” ¶0026, “She can still see her computer screen and her desk. She can answer the phone when it rings and type on her keyboard. However, when she looks at the floor, she sees sand. When she looks behind her computer, she sees the waves rolling onto the beach. As she turns around in a complete 360 she sees a beach in Maui except for where the selected real-world objects are displayed.”).
As to claim 7, claim 1 is incorporated and the combination of Smith, Miller and Stafford discloses the content data comprises details of data rendered by the extended reality device to the user (Smith, ¶0023, “the user may select a beach virtual environment for one side of the cubicle and a live-stream from his child' s daycare center for another side of the cubicle.” ¶0024, “a user may provide a finger twisting or other input to rotate a virtual beach environment so that the ocean is directly in front of the real world desk rather than to the right side of the desk.” ¶0073, “a user can create a user experience that combines a real world stationary exercise bike with a virtual environment of a bike race competition in France. As the user encounters hills in the virtual reality content, the settings on the real world bike can be automatically changed to simulate biking up the hills. For example, the user experience device can send a wired or wireless communication to the device to adjust the setting. In this example, user experiences could be combined to allow two users to simulate biking against one another in the bike race in France. The respective users are real world objects in each other's user experiences while the users share the same virtual reality content.”).
As to claim 8, claim 1 is incorporated and the combination of Smith, Miller and Stafford discloses the historic user behavior data comprises one or more user actions of the user, relating to accessing the real-world environment, during previous usages of the extended reality device (Smith, ¶0022, “to navigating through preselected or previously used virtual environment options, the user may be given the option to identify or find a virtual environment from another source location.” ¶0042, “The object and content selections 117 store the user's selections of which real world objects should or should not be included in the user experience, such as real object selections 131. The object and content selections 117 also store the user's selections of which virtual reality content should be included in the user experience, such as virtual reality content selection 132. The 3D object models 118 store models of the real world objects that will be included and/or excluded from the user experience.” ¶0064, “the user has previously selected real world objects for inclusion” ¶0076.).
As to claim 9, claim 1 is incorporated and the combination of Smith, Miller and Stafford discloses the user movement data comprises at least one of eyeball movement, hand movement and head movement of the user wearing the extended reality device (Smith, ¶0022, “Input such as hand gestures can be recognized to allow the user to navigate through different virtual environments.”, ¶0063, “the user could hold an extended finger 601 of a hand 602 still in a particular position for at least a predetermined amount of time, e.g., 3 seconds. The HDM or other device detects the hand 602 and the extended finger 601 being held steady for more than the threshold amount of time. Based on this, the device begins to track the movement of the extended finger 601 until a condition is satisfied.”).
As to claim 11, the combination of Smith, Miller and Stafford discloses a processing unit for controlling access to virtual environment and real-world environment in an extended reality environment, the processing unit comprises: one or more processors; and a memory communicatively coupled to the one or more processors, wherein the memory stores processor-executable instructions, which, on execution, cause the one or more processors to: receive, in real-time, content data, historic user behavior data, user movement data, and user commands data, during display of virtual environment to a user wearing an extended reality device; wherein the user movement data is received from one or more cameras placed on the interior surface of the extended reality device corresponding to eyeball movement of the user, and one or more cameras placed on the exterior surface of the extended reality device corresponding to at least one of: hand movement, hand gesture, and direction of motion; correlate all of the content data, historic user behavior data, and user movement data in combination; identify intent of one or more users associated with the virtual environment, to access real-world environment, based on the correlation; enable display of the virtual environment and one or more selected views of the real-world environment simultaneously on display screen of the extended reality device, based on the intent, to control access to the virtual environment and one or more selected views of the real-world environment (See claim 1 for detailed analysis.).
As to claim 12, claim 11 is incorporated and the combination of Smith, Miller and Stafford discloses the one or more processors are configured to identify the intent of the one or more users, by: Correlating the one or more parameters; and identifying the intent of the user to interact with at- least one real-word object in the real-world environment, based on the correlation (See claim 2 for detailed analysis.).
As to claim 13, claim 12 is incorporated and the combination of Smith, Miller and Stafford discloses the one or more processors are configured to enable the display of the virtual environment and the real-world environment, by: displaying the at least one real-world object as the real-world environment in the display screen of the extended reality device (See claim 3 for detailed analysis.).
As to claim 14, claim 13 is incorporated and the combination of Smith, Miller and Stafford discloses the one or more processors are configured to display the at least one real- world object by: integrating a sensor system in the extended reality device to detect location of the at least one real-world object in the real-world environment; computing set of coordinates related to the real-world object in the real-world environment; and mapping the set of coordinates with a Region of Interest (ROI) on the display screen, to provide real-time display of the at least one real-world object in the ROI (See claim 4 for detailed analysis.).
As to claim 15, claim 14 is incorporated and the combination of Smith, Miller and Stafford discloses the one or more processors are configured to display the at least one real- world object by: controlling the sensor system to enable fixed display of the at least one real-world object in the ROI, irrespective of orientation of the extended reality device (See claim 5 for detailed analysis.).
As to claim 17, claim 11 is incorporated and the combination of Smith, Miller and Stafford discloses the content data comprises details of data rendered by the extended reality device to the user (See claim 7 for detailed analysis.).
As to claim 18, claim 11 is incorporated and the combination of Smith, Miller and Stafford discloses the historic user behavior data comprises one or more user actions of the user, relating to accessing the real-world environment, during previous usages of the extended reality device (See claim 8 for detailed analysis.).
As to claim 19, claim 11 is incorporated and the combination of Smith, Miller and Stafford discloses the user movement data comprises at least one of eyeball movement, hand movement and head movement of the user wearing the extended reality device (See claim 9 for detailed analysis.).
Claims 6 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Smith (US Pub 2018/0173404 A1) in view of Miller et al. (US 2024/0411364 A1) and Stafford et al. (US Pub 2017/0358141 A1), further in view of Osman et al. (US Pub 2014/0361976 A1)
As to claim 6, claim 1 is incorporated and the combination of Smith and Miller discloses enabling the display of the virtual environment and the real-world environment, comprises: transitioning, by the processing unit, (Smith, ¶0059, “identifying the other portions of the individual frames that do not include the real world objects that are in the subset, as shown in block 403. In one example, this involves automatically identifying the pixels of each frame of the live feed that are not pixels of the user-selected real world objects.” ¶0060, “modifying the individual frames by replacing the other portions with the virtual reality content, as shown in block 404. In one embodiment, this involves replacing pixels of all portions of each frame that are not part of a user-selected real world object with virtual reality content. In one example, virtual reality content for an entire frame is used as a starting point and the user-selected real world objects are displayed over top. The result is that user-selected real world objects and some portions of the virtual reality content are displayed in each frame, effectively replacing the other portions of the live feed with virtual reality content.”) (Miller, ¶0008, “The intent of the user to interact with the physical devices is determined on the basis of the instructions received from the user and contextual signals, for example, time of the day, previously stored historical data, previously determined intent, past user behavior, other pre-stored user intents, predictive intents corresponding to the physical devices. In this way, the user is interfaced with that particular object or device using world-locked eye gaze and eye tracking techniques without having to reference that particular device by its nickname or ID overcoming the drawbacks of the conventional methods. Further, the user can control and interact with any of the physical objects or devices dynamically in real-time in a more intuitive way without the requirement of using the object's or device's IDs or nicknames, or location data. Interfacing the user with the physical objects or devices through the 3D representation and through eye gazing techniques, enables the user to perform any kind of user command, for example, head gesture, hand gesture, voice gesture, finger taps, drag and drop movement, finger pinching, rotating movement, bloom gesture, resizing, selecting, moving, a natural language query, and any other kinds of AR-related commands.” and ¶0048, “The intent of the user is determined by determining the instructions and one or more contextual signals. The instructions may be explicit (e.g., “turn on a light”) or implicit (e.g., “where did I buy this”). For example, the implicit instruction may be based on determining the eye tracking and eye gazing process as well without referencing the physical device. The intent identification engine 130 may further determine the one or more contextual signals that may include, but are not limited to, kinds of user intents for interacting with the physical objects or devices, kinds of commands detected based on previously issued commands of the user in the past, and based on other factors such as time of the day the user has interacted before or the user is likely to interact in present time, user behavior learned from previous actions or intents, or predicting user behavior in the present time, current state and historical state of both the user and the physical objects and/or devices, historical context semantics of the user intent for interaction, weather forecast information in the present time, past time or for the future time, default settings set for each physical object or device, priority scores associated with each physical object and/or device, different zones or regions of interest the user has shown specific priority or interest to interact, number of times of each of the factors stated above, and other related factors that are specific for i