Last updated: April 19, 2026
Application No. 18/592,224
GAZE-BASED AUDIO SWITCHING AND 3D SIGHT LINE TRIANGULATION MAP

Non-Final OA §103
Filed
Feb 29, 2024
Examiner
SUN, HAI TAO
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Adeia Guides Inc.
OA Round
1 (Non-Final)
Interview Optional

— +26.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 476 resolved cases, 2023–2026
Examiner Intelligence

SUN, HAI TAO View full profile →
Grants 73% — above average
Career Allow Rate
347 granted / 476 resolved
+10.9% vs TC avg
Strong +27% interview lift
Without
With
+26.6%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
35 currently pending
Career history
511
Total Applications
across all art units
Statute-Specific Performance

§101
6.9%
-33.1% vs TC avg
§103
65.8%
+25.8% vs TC avg
§102
2.3%
-37.7% vs TC avg
§112
15.9%
-24.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 476 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Preliminary Amendment
The preliminary amendment received 05/13/2024 has been entered.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 34-40 and 42-53 are rejected under 35 U.S.C. 103 as being unpatentable over Miller (US 20230130770 A1) and in view of Golyshko (US 20150319400 A1). 
Regarding to claim 34,  Miller discloses a method (Fig. 6; [0077]: determine the intent of users to interact with physical devices in the physical environment; [0078]: the computing system captures and receives an image of a physical environment surrounding a user wearing the AR device; [0079]: the computing system accesses a three-dimensional (3D) map corresponding to the physical environment) comprising: 
maintaining a 3D map of an environment ([0043]: the map generation engine 122 generates the 3D map representation of the physical environment detecting all the physical objects; [0047]: localize the user in the 3D map; [0079]: the computing system accesses a three-dimensional (3D) map corresponding to the physical environment ), the map indicating respective 3D locations of each of a plurality of camera devices in the environment ([0047]: localize the user in the 3D map; [0079]: all the physical objects are represented in 3D map representation; the physical devices are the smart devices, smart units, and any IoT devices; [0080]: the generated 3D map is a high-fidelity 3D representation of the physical objects with geometry and appearance; Fig. 4A; [0068]: the 3D map; 
    PNG
    media_image1.png
    540
    706
    media_image1.png
    Greyscale
), wherein each of the plurality of camera devices is capable of capturing video data of the environment (Fig. 1B; [0033]: the image sensors include digital still cameras, a digital moving image, and video cameras; [0071]: video cameras associated with the AR device capture video; [0078]: the computing system captures and receives an image of a physical environment surrounding a user wearing the AR device;  AR glasses provide stream sensor data in real time); 
analyzing first video data from the plurality of camera devices to identify: 
(a) a 3D location of a user in the environment ([0034]: locate a location of the user in the physical environment; [0056]: determine and identify the spatial location of the user and spatial location of each of the physical objects; Fig. 2; [0057]: the scene application 210 localizes the user in the environment; determine the location of the user; [0081]: the computing system determines a pose of the AR device relative to the three-dimensional map based on the first features of physical objects captured in the image; determine and identify the location of the user); and 
(b) a 3D location of a mobile device in the environment (Fig. 1C; [0056]: determine the spatial location of the user and spatial location of each of the physical objects;  Fig. 2; [0057]: the scene application 210 localizes all spatial locations of each of the physical objects including the one or more devices; [00081]: the computing system determines a pose of the AR device relative to the three-dimensional map based on the first features of physical objects captured in the image and the second features of object representations in the three-dimensional map; the spatial locations of each of the devices and the objects are determined and updated in the 3D map representation; detect the spatial location of each physical device); 
updating the 3D map of the environment indicating the 3D location of the user and the 3D location of the mobile device ([0072]: update 3D map representation; [0075]: artificial reality device 102 provides synchronized, continuous, and updated feature maps with low latency;  [0079]: a live and updating 3D object-centric map of the physical environment is accessed along with attributes or features of the physical objects including devices; [0081]: the spatial locations of each of the devices and the objects are determined and updated in the 3D map representation); 
analyzing second video data from the plurality of camera devices in combination with cross-referencing the updated 3D map of the environment to determine that a gaze of the user is directed to a display of the device (Fig. 4B; [0069]: the gaze of both eyes of the user is associated with line-of-sight within the three-dimensional map; the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device;  the user 404 is determined to be gazing at tv 406 when the TV unit 406 is detected within and around the radius of the intersection point of the eye gaze convergence; 
    PNG
    media_image2.png
    308
    662
    media_image2.png
    Greyscale
 ; [0083]: the computing system uses eye tracking and eye gazing units for determining a gaze of an eye of the user;  compute the gaze of the eye of the user associated with the line-of-sight of the user within the three-dimensional map; determine a gaze of another eye of the user associated with the line-of-sight within the three-dimensional map; [0085]: detect each device and object where the user is looking at or gazing at); and 
based on determining the gaze, causing the mobile device to perform an action (Fig. 2; [0057]: mobile device; [0074]: instruct the smartwatch to display the associated stopwatch app automatically for setting the timer; [0085]: detect each device and object where the user is looking at or gazing at; [0086]: the computing system determines the intent of the user to interact with a physical device corresponding to one of the one or more representations of physical devices; [0087]: the computing system issues a command to the physical device based on the determined intent of the user.).
Miller fails to explicitly to disclose a display of the device is a display of the mobile device.
In same field of endeavor, Golyshko teaches display of the device is a display of the mobile device ([0030]: mobile computing;  a mobile telephone, a portable video player, a portable music player, a portable gaming machine, and a smart phone; [0072]: a liquid crystal display (LCD) for a mobile device; Fig. 5; [0074]: determine whether one or both eyes of the user are focused on display).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Miller to include a display of the device is a display of the mobile device as taught by Golyshko. The motivation for doing so would have been to determine whether one or both eyes of the user are focused on display; to detect a user  in the viewing area  has left the viewing area as taught by Golyshko in Fig. 5 and paragraph [0074] and [0126].

Regarding to claim 35, Miller in view of Golyshko discloses the method of claim 34, wherein the first video data includes at least one of 3D location data of the user or 3D location data of the mobile device (Miller; Fig. 1B; [0033]: video cameras capture video; [0034]: locate a location of the user in the physical environment; [0056]: determine and identify the spatial location of the user and spatial location of each of the physical objects; Fig. 2; [0057]: the scene application 210 localizes the user in the environment; determine the location of the user; Fig. 3A; [0067]: the physical environment 300 of the user 302 with all devices and objects that are in line-of-sight 304 of the user 302; Fig. 4A; [0068]: a bounding box and data points of all features associated with each device and object present in a region of interest and that are in the line-of-sight of the user viewability in the physical environment; [0078]: the computing system captures and receives an image of a physical environment surrounding a user wearing the AR device).

Regarding to claim 36, Miller in view of Golyshko discloses the method of claim 34, wherein the mobile device in the environment is capable of capturing video data of the environment (Miller; Fig. 1B; [0033]: the video cameras of HMD capture video; [0041]: a user wears the AR device 102; the image-capturing engine of AR 120 captures live streams, moving images and video footage, pictures, etc; [0078]: the computing system captures and receives an image of a physical environment surrounding a user wearing the AR device).

Regarding to claim 37,  Miller in view of Golyshko discloses the method of claim 36, wherein the second video data includes at least one of eye movement data or head position data of the user captured by at least one of the plurality of devices or by the mobile device (Miller; [0051]: detect the user's head gestures; Fig. 1B; [0055]: track the eye gazing and vergence movement of the user wearing the HMD 102; [0074]: detect the user's head gestures); and 
wherein analyzing the second video data from the plurality of camera devices in combination with cross-referencing the updated 3D map of the environment (same as rejected in claim 34) is based on:
 receiving, from each of the plurality of camera devices and the mobile device, at least one of eye movement data or head position data of the user (Miller; [0051]: detect the user's head gestures; Fig. 1B; [0055]: track the eye gazing and vergence movement of the user wearing the HMD 102); 
projecting a line of sight of the user based on cross-referencing the at least one of eye movement data or head position data of the user from each of the from each of the plurality of camera devices and the mobile device and using the updated the 3D map (Miller; Fig. 4A; [0068]: a bounding box and data points of all features associated with each device and object present in a region of interest and that are in the line-of-sight of the user viewability in the physical environment; project the line-of-sight of the user viewability in the physical environment as illustrated in Fig, 4A; [0071]: from the line of sight of the user along with vector values associated with each physical object; [0079]: the physical objects are in line-of-sight of the user's gaze); 
triangulating the gaze of the user based on the cross-referenced line of sight in the updated 3D map (Miller; Fig. 4B; [0069]: the gaze of both eyes of the user is associated with line-of-sight within the three-dimensional map; the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device;  the user 404 is determined to be gazing at tv 406 when the TV unit 406 is detected within and around the radius of the intersection point of the eye gaze convergence; 
    PNG
    media_image2.png
    308
    662
    media_image2.png
    Greyscale
 ; [0083]: the computing system uses eye tracking and eye gazing units for determining a gaze of an eye of the user;  computes the gaze of the eye of the user associated with the line-of-sight of the user within the three-dimensional map and further comprises determining a gaze of another eye of the user associated with the line-of-sight within the three-dimensional map; [0085]: the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device).

Regarding to claim 38, Miller in view of Golyshko discloses the method of claim 37, wherein the 3D map of the environment comprises a plurality of reference meshes, wherein at least one of the plurality of reference meshes corresponds to the mobile device (Fig. 4A; [0068]); and 
wherein the determining that the gaze of the user is directed to the display of the mobile device (same as rejected in claim 34) is further based on: 
determining a focal point of the triangulated gaze of the user (Miller; Fig. 4B; [0069]: the gaze of both eyes of the user is associated with line-of-sight within the three-dimensional map; the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device;  the user 404 is determined to be gazing at tv 406 when the TV unit 406 is detected within and around the radius of the intersection point of the eye gaze convergence; 
    PNG
    media_image2.png
    308
    662
    media_image2.png
    Greyscale
); and 
determining a collision of the focal point with the at least one of the plurality of reference meshes corresponding to the mobile device (Miller; Fig. 4A; [0067]: all the devices and objects in the living room are assumed to be located in the line of sight 304 of the user 302 based on some distance from the line-of-sight of the user; Fig. 4A; [0068]: 3D object-centric map representations for all the devices and objects in the physical environment; 
    PNG
    media_image3.png
    530
    730
    media_image3.png
    Greyscale
 ; Fig. 4B; [0069]: the gaze of both eyes of the user is associated with line-of-sight within the three-dimensional map; the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device;  the user 404 is determined to be gazing at tv 406 when the TV unit 406 is detected within and around the radius of the intersection point of the eye gaze convergence; 
    PNG
    media_image2.png
    308
    662
    media_image2.png
    Greyscale
).

Regarding to claim 39, Miller in view of Golyshko  discloses the method of claim 34, wherein the 3D map of the environment comprises a plurality of reference meshes, wherein at least one of the plurality of reference meshes comprises a collision volume corresponding to a particular 3D location in the environment (Miller; Fig. 4A; [0068]: 3D object-centric map representations for all the devices and objects in the physical environment; 
    PNG
    media_image3.png
    530
    730
    media_image3.png
    Greyscale
 ; Fig. 4B; [0069]: the gaze of both eyes of the user is associated with line-of-sight within the three-dimensional map; the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device;  the user 404 is determined to be gazing at tv 406 when the TV unit 406 is detected within and around the radius of the intersection point of the eye gaze convergence; 
    PNG
    media_image2.png
    308
    662
    media_image2.png
    Greyscale
).

Regarding to claim 40,  Miller in view of Golyshko discloses the method of claim 39, wherein identifying the 3D location of the user in the environment (same as rejected in claim 34) is based on: 
determining, based on the first video data, that the collision volume is triggered by the user colliding with the collision volume (Miller; Fig. 4A; [0068]: 3D object-centric map representations for all the devices and objects in the physical environment; 
    PNG
    media_image3.png
    530
    730
    media_image3.png
    Greyscale
 ; Fig. 4B; [0069]: the gaze of both eyes of the user is associated with line-of-sight within the three-dimensional map; the eye gaze convergence is computed by evaluating eye gaze vector elements intersecting at the physical device;  the user 404 is determined to be gazing at tv 406 when the TV unit 406 is detected within and around the radius of the intersection point of the eye gaze convergence; 
    PNG
    media_image2.png
    308
    662
    media_image2.png
    Greyscale
).

Regarding to claim 42,  Miller in view of Golyshko discloses the method of claim 34, wherein the 3D location of the mobile device is further identified based on: 
receiving, from the mobile device, simultaneous localization and mapping (SLAM) data or inertial sensor data of the mobile device (Miller;  [0033]: the AR device 102 includes one or more sensors, object tracking units, eye tracking units, RGB units, simultaneous localizing and mapping (SLAMs) units; [0042]: the map generation engine 120 utilizes SLAM, object tracking, RGB, IMU, and other related 3D object detection pipelines to scan the entire zone or region or area); 
cross-referencing the SLAM or inertial sensor data of the mobile device with the first video data from the plurality of camera devices (Miller; [0034]: each of the physical objects, including one or more devices 110a, . . . , 110n, and the one or more objects along with locating a location of the user in the physical environment, is detected by using the SLAM units, IMUs, the eye tracking units, and the eye gazing units of the AR device 102).

Regarding to claim 43, Miller in view of Golyshko discloses the method of claim 34, wherein the identifying the 3D location of a user in the environment (same as rejected in claim 34) is further based on: 
receiving, from each of the plurality of camera devices, an image (Miller; [0071]: receive one or more sensor image data from digital still cameras, a digital moving image, and video cameras associated with the AR device 102;  Fig. 6; [0078]: the computing system captures and receives an image of a physical environment surrounding a user wearing the AR device); 
recognizing each of the images as the user (Miller; [0079]: the computing system accesses a three-dimensional (3D) map corresponding to the physical environment; [0080]: the generated 3D map is a high-fidelity 3D representation of the physical objects with geometry and appearance; [0083]: determine a gaze of an eye of the user); 
measuring a distance between each of the plurality of the camera devices and the user (Miller; [0067]: all the devices and objects in the living room are located in the line of sight 304 of the user 302 based on some distance from the line-of-sight of the user; device and objects that are located within 2 meters of distance from the user 302; [0071]: the object representations in the 3D map that are based on the 6DOF of the poses of the AR device 102 includes the distance, and spatial locations of each physical object from the line of sight of the user along with vector values associated with each physical object); and 
cross-referencing each the distances between each of the plurality of camera devices and the user with each other (Miller; [0067]: all the devices and objects in the living room are located in the line of sight 304 of the user 302 based on some distance from the line-of-sight of the user; device and objects that are located within 2 meters of distance from the user 302; [0071]: the distance, and spatial locations of each physical object from the line of sight of the user along with vector values associated with each physical object; [0081]: devices and objects that are located within 2 meters of distance from the user).

Regarding to claim 44, Miller in view of Golyshko  discloses the method of claim 34, further comprising: 
determining a change in the 3D location of the user (Miller;  [0034]: locate a location of the user in the physical environment; [0052]: when user of artificial reality device 102 traverse throughout the physical environment, for example by moving throughout rooms or areas or floors or zones of a particular environment or house or building, etc., artificial reality device 102 provides synchronized, continuous, and updated feature maps with low latency; [0056]: determine the spatial location of the user and spatial location of each of the physical objects); 
updating the plurality of camera devices to include devices which can capture at least one of eye movements or head position of the user at the changed 3D location of the user (or is optional; Miller; [0052]: when user of artificial reality device 102 traverse throughout the physical environment, for example by moving throughout rooms or areas or floors or zones of a particular environment or house or building, etc., artificial reality device 102 provides synchronized, continuous, and updated feature maps with low latency; Fig. 1B; [0055]: an eye-tracking unit tracks the eye gazing and vergence movement of the user wearing the HMD 102).

Regarding to claim 45, Miller in view of Golyshko discloses the method of claim 34, further comprising: 
determining a change in the 3D location of the mobile device (Miller; [0052]: when user of artificial reality device 102 traverse throughout the physical environment, for example by moving throughout rooms or areas or floors or zones of a particular environment or house or building, etc., artificial reality device 102 provides synchronized, continuous, and updated feature maps with low latency); 
updating the plurality of camera devices to include devices which can capture video data of the mobile device at the changed 3D location of the mobile device (Miller; [0052]: when user of artificial reality device 102 traverse throughout the physical environment, for example by moving throughout rooms or areas or floors or zones of a particular environment or house or building, etc., artificial reality device 102 provides synchronized, continuous, and updated feature maps with low latency; [0059]: the hub 206 updates the latest object and devices' states, the eye-tracking vector, and the object the user is currently looking or gazing at; Fig. 4A; [0068]: the newly detected devices and objects are updated in the object and device library 126).

Regarding to claim 46, Miller in view of Golyshko discloses the method of claim 34, wherein the action comprises at least one of: playing media content, sharing media content, causing a second device to play media content which is already playing on the mobile device, playing media content on the mobile device which is already playing on the second device, or activating a household appliance (Miller;  [0051]: instruct the smartwatch to display the associated stopwatch app automatically for setting the timer or watch over the set timer while heating the food in the oven; [0087]: a particular type of interaction like “turning on the light” based on user's location detected in the kitchen or “switching off the tv” when the time is beyond 11 p.m. etc. or when the user leaves the tv area for a while).

Regarding to claim 47, Miller in view of Golyshko  discloses the method of claim 34, further comprising: 
determining that the gaze of the user is no longer directed to the display of the mobile device (Golyshko; Fig. 5; [0074]: determine whether one or both eyes of the user are focused on display; Fig. 7; [0126]: detect and determine that a user in the viewing area has left the viewing area; [0141]: the user was outside the viewing area); and 
same motivation of claim 34 is applied here.
Miller in view of Golyshko  further discloses:
based on the determining, terminating the performance of the action by the mobile device (Miller; [0083]: determine a gaze of another eye of the user associated with the line-of-sight within the three-dimensional map for performing eye tracking; [0087]: a particular type of interaction like “turning on the light” based on user's location detected in the kitchen or “switching off the tv” when the time is beyond 11 p.m. etc. or when the user leaves the tv area for a while).

Regarding to claim 48, Miller discloses a system (Fig. 1A; [0025]: a system 100 includes an artificial reality device (AR); [0026]: a system 100 includes one or more computing devices; Fig. 6; [0077]: determine the intent of users to interact with physical devices in the physical environment; [0078]: the computing system captures and receives an image of a physical environment surrounding a user wearing the AR device; [0079]: the computing system accesses a three-dimensional (3D) map corresponding to the physical environment) comprising: 
control circuitry configured to ([0033]: the AR device 102 may comprise one or more processors 104, a memory 106 for storing computer-readable instructions to be executed on the one or more processors 104, and a display 108; [0039]: the processor 116 is programmed to implement computer-executable instructions that are stored in memory 118 or memory units): 
the rest claim limitations are similar to claim limitations recited in claim 34. Therefore, same rational used to reject claim 34 is also used to reject claim 48.

Regarding to claim 49, Miller in view of Golyshko discloses the system of claim 48, 
the rest claim limitations are similar to claim limitations recited in claim 35. Therefore, same rational used to reject claim 35 is also used to reject claim 49.

Regarding to claim 50, Miller in view of Golyshko discloses the system of claim 35, 
the rest claim limitations are similar to claim limitations recited in claim 36. Therefore, same rational used to reject claim 36 is also used to reject claim 50.

Regarding to claim 51,  Miller in view of Golyshko discloses the system of claim 50, 
the rest claim limitations are similar to claim limitations recited in claim 37. Therefore, same rational used to reject claim 37 is also used to reject claim 51. 

Regarding to claim 52,  Miller in view of Golyshko discloses the system of claim 51, 
the rest claim limitations are similar to claim limitations recited in claim 38. Therefore, same rational used to reject claim 38 is also used to reject claim 52. 

Regarding to claim 53, Miller in view of Golyshko discloses the system of claim 48, 
the rest claim limitations are similar to claim limitations recited in claim 39. Therefore, same rational used to reject claim 39 is also used to reject claim 53.

Allowable Subject Matter
Claim 41 is objected to as being dependent upon a rejected base claim, i.e., claim 34 and claim 39, but would be allowable if rewritten in independent form including all of the limitations of the base claim, i.e., all of the limitations of claim 34 and claim 39, and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Hai Tao Sun whose telephone number is (571)272-5630. The examiner can normally be reached 9:00AM-6:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached at 5712727642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/HAI TAO SUN/Primary Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

Feb 29, 2024
Application Filed
Sep 24, 2025
Non-Final Rejection — §103
Mar 26, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/597,939
Patent 12602816
SIMULATED CONFIGURATION EVALUATION APPARATUS AND METHOD
2y 5m to grant Granted Apr 14, 2026
18/684,393
Patent 12603024
DISPLAY CONTROL DEVICE
2y 5m to grant Granted Apr 14, 2026
18/527,903
Patent 12586310
APPARATUS AND METHOD WITH IMAGE PROCESSING
2y 5m to grant Granted Mar 24, 2026
18/066,199
Patent 12578846
GENERATING MASKED REGIONS OF AN IMAGE USING A PREDICTED USER INTENT
2y 5m to grant Granted Mar 17, 2026
18/414,841
Patent 12579727
APPARATUS AND METHOD FOR ASYNCHRONOUS RAY TRACING
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
73%
Grant Probability
99%
With Interview (+26.6%)
2y 7m
Median Time to Grant
Low
PTA Risk
Based on 476 resolved cases by this examiner. Grant probability derived from career allow rate.