Last updated: April 19, 2026

Application No. 18/722,461

AN EYE TRACKING VIRTUAL REALITY DEVICE FOR VOICE TAGGING IN VIRTUAL ENVIRONMENT AND OPERATING METHOD THEREOF

Final Rejection §103

Filed

Jun 20, 2024

Examiner

BOYD, JONATHAN A

Art Unit

2627

Tech Center

2600 — Communications

Assignee

Ismail Koçak

OA Round

2 (Final)

Interview Optional

— +7.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 698 resolved cases, 2023–2026

Examiner Intelligence

BOYD, JONATHAN A View full profile →

Grants 69% — above average

Career Allow Rate

481 granted / 698 resolved

+6.9% vs TC avg

Moderate +7% lift

Without

With

+7.0%

Interview Lift

resolved cases with interview

Typical timeline

2y 7m

Avg Prosecution

24 currently pending

Career history

722

Total Applications

across all art units

Statute-Specific Performance

§101

2.6%

-37.4% vs TC avg

§103

53.7%

+13.7% vs TC avg

§102

27.8%

-12.2% vs TC avg

§112

9.9%

-30.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 698 resolved cases

Office Action

§103

DETAILED ACTIONNotice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 10 June 2025 have been fully considered but they are not persuasive. The Examiner respectfully disagrees with Applicant’s assertions that Lee does not teach a “dual camera system.” P[0029] details how cameras are used on the AR headset to track the physical world (outer image acquisition unit) and p[0059] details how cameras are oriented inward to image the user’s eyes. 	The Examiner respectfully disagrees with Applicants assertions that Lee does not teach “voice tagging linked to gaze confirmation.” P[0026] teaches that audio recordings (audio tags) are associated with a particular word or sentence at which the persons is identified as looking at when speaking the phrase “take a note.” Thus the user’s gaze has already been identified and confirmed to be on a particular word in which the audio tag is then linked to. 	The Examiner respectfully disagrees with Applicant’s assertions that Lee does not teach an “integrated VR headset.” P[0060] states that the headset may be a virtual reality VR headset. However it does not appear that a VR headset has been claimed due to the outer acquisition unit seemingly being used for an AR aspect. 	Further, Lee does not appear to explicitly show spatially contextual audio tags as the figures do not explicitly show how the tags are presented to the user. Thus a new reference in Vishwanathan is introduced which explicitly shows spatially contextual audio tags.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 2 and 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee (2021/0097284) in view of Vishwanathan et al (2019/0200154) (herein “Vishwanathan”).	In regards to claim 1, Lee teaches a virtual reality device (1) (See; p[0060] for headset 216 which may be AR or VR) for including voice as a tool to add value and extra layer of information to the virtual environment (See; p[0026] for adding audio notes); comprising at least one outer image acquisition unit (1.1) configured to capture images of the external environment enable augmentation within a virtual reality interface (See; p[0021]-p[0032] for an AR headset which mixes real world environment and virtual objects), at least one inner image acquisition unit (1.1) configured to track eye movement and determine convergence and pupil miosis to identify a visual focal point of the user (See; p[0059], p[0066]-p[0067], p[0077] for eye tracking using traditional algorithms. Where it is well known in the area of eye tracking to track eye convergence and line of sight / focal points to determine what the user is gazing at); at least one inner voice acquisition unit (1.2) configured to record stereo audio (See; p[0054] for audio receiver/microphone 195) from the user wherein audio tagging is initiated only after confirmation of the user’s focal point based on data from the inner image acquisition unit (See; p[0026] where the user may record audio notes in the AR platform where the audio recordings may be associated with particular words at which the person is identified to be looking at when speaking. Thus in this instance where it has been identified that the person is looking at a particular word (confirmation of focal point on the word) and they speak the phrase “take a note” audio tagging is initiated), at least one control/calculating unit (1.3) operatively connected to the image and voice acquisition units to determine user intent and generate virtual tags linked to the identified focal point (See; Figs. 5, 6 p[0066]-p[0067] where virtual objects may pop up based on the eye tracking); at least one an user interface (1.5) (See; Figs. 5, 6 and p[0070]-p[0071] where a GUI may be displayed for the user to navigate through), and at least one peripheral display (1.4.1) (See; Abstract, Fig. 1,p[0037] and  p[0050] for a display device 192); at least one speaker (1.6) (See; p[0050] for speakers 194), at least one strap (1.8) (See; Fig. 3 for a housing 300 including a strap to hold it to the user’s head) and at least one sensor (1.9) configured to determine distance from the device to the object in the environment (See; p[0059] for cameras 310 oriented outward in the direction the user’s head would be oriented or inward to image the user’s eyes. It is well known for cameras to use algorithms to determine distance an object would be from the user for proper calibration).	Lee fails to explicitly teach at least one peripheral display (1.4.1) configured to present a virtual environment that included spatially contextual audio tags.	However, Vishwanathan teaches at least one peripheral display (1.4.1) configured to present a virtual environment that included spatially contextual audio tags (See; Fig. 4A and p[0065] where a virtual environment can have a plurality of spatially contextual audio tags presented to the user). Therefore it would have been obvious to one of ordinary skill in the art at the time of filing to modify Lee to show spatially contextual audio tags so as allow the user to easily access audio tags when needed. 	In regards to claim 2, Lee teaches wherein the control/calculating unit (1.3) comprises at least one storage unit (1.3.1), configured to store recorded audio tags positional data associated with identified focal points, and related metadata (See; Fig. 3 for storage 308 or Fig. 1 140 Memory), at least one input/output unit (1.3.2) comprising one or more device interfaces USB ports, external controllers, and optional external storage interfaces, at least one central processing unit (1.3.3) configured to execute control logic and manage device functions (See; Fig. 1 for I/O controller Hub 150, storage 180 or memory 140 and processors 122), at least one communication adapter (1.3.4) operatively connected to a network adapter (1.3.5), the network adapter (1.3.5) enabling wireless communication via Wi-Fi and/or Bluetooth (See; Fig. 1 for WIFI 182 or p[0050] for LAN interface / network interface) and at least one sensing unit (1.3.6) including one or more sensors configured to detect device position, orientation, or user interactions (See; Fig. 1 for cameras 193 and p[0055] for a gyroscope which senses the orientation of the system).

	In regards to claim 5, Lee teaches a method (100) for operating the virtual device (1) of claim 1, comprising : (a) receiving input from at least one front-facing image acquisition unit (1.1) (See; p[0059] for cameras 310 oriented outward in the direction the user’s head would be oriented); (b) calibrating the position of the user within the virtual environment based on the received input (See; p[0055]); (c) simultaneously recording eye movement using at least one inward-facing image acquisition unit (1.1) positioned inside the device (See; p[0059], p[0066]-p[0067], p[0077] for eye tracking using traditional algorithms. Where it is well known in the area of eye tracking to track eye convergence and line of sight to determine what the user is gazing at); (d) determining a convergence point of the user's pupils to identify a focal point in the virtual environment;(e) if convergence is not detected, detecting pupil miosis to estimate the user's focus (See; p[0059], p[0066]-p[0067], p[0077] for eye tracking using traditional algorithms. Where it is well known in the area of eye tracking to track eye convergence and line of sight to determine what the user is gazing at); (f) upon successful determination of the user's focus in step (d) or (e), initiating voice input detection;(g) recording digital audio from the user using at least one voice acquisition unit (1.2); and(h) associating the recorded audio as a voice tag with the focal point determined in step (d) or (e) (See; p[0026] where the user may record audio notes in the AR platform where the audio recordings may be associated with particular words at which the person is identified to be looking at when speaking. Thus in this instance where it has been identified that the person is looking at a particular word (confirmation of focal point on the word) and they speak the phrase “take a note” audio tagging is initiated).

Claim(s) 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lee (2021/0097284) in view of Vishwanathan et al (2019/0200154) (herein “Vishwanathan”) and further ) in view of Hwang et al (2020/0005539) (herein “Hwang’).	In regards to claim 3, Lee fails to explicitly teach, wherein the device further comprises at least one control element (1.7) configured to simulate hand movements within the virtual environment and to control the user interface (1.5).	However, Hwang at least one control element (1.7) configured to simulate hand movements within the virtual environment and to control the user interface (1.5) (See; Figs. 3, 4 and abstract for a near eye display able to detect one or more gestures (hand movements) of the user to control a user interface). Therefore it would have been obvious to one of ordinary skill in the art at the time of filing to modify Lee to detect gestures such as in Hwang so as to add further user input control to the device, increasing user satisfaction in the device. 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONATHAN A BOYD whose telephone number is (571)270-7503. The examiner can normally be reached Mon - Fri 8:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ke Xiao can be reached at (571) 272-7776. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JONATHAN A BOYD/Primary Examiner, Art Unit 2627

Read full office action

Prosecution Timeline

Jun 20, 2024

Application Filed

Feb 18, 2025

Non-Final Rejection — §103

Jun 10, 2025

Response Filed

Sep 19, 2025

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/969,418

Patent 12604616

DISPLAY DEVICE

2y 5m to grant Granted Apr 14, 2026

18/412,038

Patent 12591362

ADJACENT CAPACITIVE TOUCH SCREEN EVENT TRACKING

2y 5m to grant Granted Mar 31, 2026

18/532,642

Patent 12586534

DRIVING CIRCUIT UNIT, DISPLAY DEVICE INCLUDING THE SAME, AND METHOD OF DRIVING THE SAME

2y 5m to grant Granted Mar 24, 2026

18/682,756

Patent 12586516

DISPLAY DEVICE

2y 5m to grant Granted Mar 24, 2026

18/740,194

Patent 12585348

INPUT DEVICE, CONTROL METHOD, AND NON-TRANSITORY RECORDING MEDIUM

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

69%

Grant Probability

76%

With Interview (+7.0%)

2y 7m

Median Time to Grant

Moderate

PTA Risk

Based on 698 resolved cases by this examiner. Grant probability derived from career allow rate.

AN EYE TRACKING VIRTUAL REALITY DEVICE FOR VOICE TAGGING IN VIRTUAL ENVIRONMENT AND OPERATING METHOD THEREOF

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email