Last updated: April 19, 2026

Application No. 18/379,582

VIRTUAL AUDIO AUGMENTATION USING COMPUTER VISION

Final Rejection §102§103

Filed

Oct 12, 2023

Examiner

AL AUBAIDI, RASHA S

Art Unit

2693

Tech Center

2600 — Communications

Assignee

Nvidia Corporation

OA Round

2 (Final)

Interview Optional

— +11.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 744 resolved cases, 2023–2026

Examiner Intelligence

AL AUBAIDI, RASHA S View full profile →

Grants 78% — above average

Career Allow Rate

577 granted / 744 resolved

+15.6% vs TC avg

Moderate +11% lift

Without

With

+11.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

38 currently pending

Career history

782

Total Applications

across all art units

Statute-Specific Performance

§101

10.2%

-29.8% vs TC avg

§103

55.9%

+15.9% vs TC avg

§102

16.1%

-23.9% vs TC avg

§112

8.4%

-31.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 744 resolved cases

Office Action

§102 §103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

       Response to Amendment
1.	This in response to an amendment filed 10/30/2025. No claims have been added. Claim 9 has been canceled. Claims 1, 10, 14, 15, 18 and 20 have been amended. Claims 1-8 and 10-20 are now pending in this application. 

Claim Rejections - 35 USC § 102
2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

The applied reference has a common inventor with the instant application. Based upon the earlier effectively filed date of the reference, it constitutes prior art under 35 U.S.C. 102(a)(2). This rejection under 35 U.S.C. 102(a)(2) might be overcome by: (1) a showing under 37 CFR 1.130(a) that the subject matter disclosed in the reference was obtained directly or indirectly from the inventor or a joint inventor of this application and is thus not prior art in accordance with 35 U.S.C. 102(b)(2)(A); (2) a showing under 37 CFR 1.130(b) of a prior public disclosure under 35 U.S.C. 102(b)(2)(B) if the same invention is not being claimed; or (3) a statement pursuant to 35 U.S.C. 102(b)(2)(C) establishing that, not later than the effective filing date of the claimed invention, the subject matter disclosed in the reference and the claimed invention were either owned by the same person or subject to an obligation of assignment to the same person or subject to a joint research agreement.

Claim(s) 1-3, 6-8, 10-15 and 17-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Satongar et al. (US PAT # 11,432,095 B1).

Regarding claims 1, 15 and 20, Satonagar teaches a method, system and processor comprising:
associating audio data of an audio stream transmitted over a plurality of input audio channels with a plurality of virtual speakers (reads on block 17 of mapping channels to virtual speaker positions at display edges, (col. 3, lines 14-20);
for each of a plurality of times (reads on head position determined continually and repeatedly in real time see Col. 4, lines 33-36): 
identifying, using an optical sensor (reads on sensor 84, 89 or 93, see col. 2, line 61), a position of a user’s head relative to the plurality of virtual speakers (reads on block 18 of having the camera/computer vision tracking of user head position/orientation, see col. 4, lines 20-25) at a respective time of the plurality of times (reads on determine and track location and orientation of the user’s head in real time during playback, see col. 4, lines 33-36 and col. 5, lines 1-65);
estimating, for the respective time and based at least on the identified position of the user's head. distances from each of the plurality of virtual speakers to one or more reference locations associated with the user's head (reads on spatializing audio based on the position of the head and virtual speakers; distance between user and virtual speaker determined and adjusted, see col. 5, lines 1-12 and col. 6. Lines 20-38);
determining, for the respective time and based at least on the estimated distances (spatialization processing uses head position and speaker geometry to produce spatial audio levels, see col. 5, lines 1-12), a plurality of simulated sound intensities at the one or more reference locations associated with the position of the user’s head, wherein the simulated sound intensities are associated with the plurality of virtual speakers (reads on block 19 of using HRIR/HRTF/BRIR filters based on head position and virtual speaker positions, see col. 4, line 66 through col. 5, line 28); and
generating, for the respective time and based on the plurality of simulated sound intensities, a plurality of output audio signals configured for a plurality of physical speakers (reads on blocks 19-20, for driving left/right headphone speakers so user perceives sound at virtual speaker positions, see col. 4, line 66 through col. 5, line 35).

Regarding claim 2, Satonagar teaches wherein the plurality of input audio channels comprises a central input audio channel (reads on virtual speaker positioned at the center of the display, see Fig. 3 and col. 3, lines 45-54), one or more side input audio channels (reads on left L and right R channels alongside center and surround signals, see Fig. 3A and corresponding text), and one or more surround input audio channels (reads on Left surround and Right surround as discussed in Fig. 4, see and corresponding text).

Regarding claim 3, Satonagar teaches wherein at least one audio channel is associated with two or more speakers block 17 of mapping channels to virtual speaker positions at display edges, (col. 3, lines 14-20).

Regarding claim 10, Satonagar teaches wherein determining the plurality of simulated sound intensities is further based on estimated directions from each of the plurality of virtual speakers to the one or more reference locations associated with the user’s head (reads on the map that is used as a reference to track the position of use’s head, see col. 3, lines 30).

Regarding claim 11, Satonagar teaches wherein individual simulated sound intensities of the plurality of simulated sound intensities are determined for multiple acoustic frequencies (see block 19 of using HRIR/HRTF/BRIR filters based on head position and virtual speaker positions, see col. 4, line 66 through col. 5, line 28).

Regarding claim 12, Satonagar teaches wherein the physical speakers comprise user’s headphones (see col. 4, lines 24-26).

Regarding claim 13, Satonagar teaches wherein locations of the plurality of virtual speakers are user- adjustable (see col. 3, lines 14-44 and col. 6, lines 5-19).

Regarding claim 18, Satonagar teaches wherein determining the plurality of simulated sound intensities comprises computing distances from the plurality of virtual speakers to the one or more reference locations associated with the user’s head (see Fig. 2 and col. 4, lines 8-19).

Regarding claim 19, Satonagar teaches wherein the system comprises at least one of: 
a system for performing simulation operations; 
a system for performing digital twin operations; 
a system for performing light transport simulation; 
a system for performing collaborative content creation for 3D assets;
a system for performing deep learning operations; 
a system for performing real-time streaming (block 18 of Fig. 1, see co. 4, lines 20-25);
a system for generating at least one of virtual reality (VR) content, augmented reality (AR) content, or mixed reality (MR) content;
a system for presenting at least one of VR content, AR content, or MR content; 
a system implemented using an edge device; 
a system implemented using a robot;
a system for performing conversational AI operations; 
a system for generating synthetic data;
a system incorporating one or more virtual machines (VMs);
a system implemented at least partially in a data center; or 
a system implemented at least partially using cloud computing resources.

Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claim(s) 4-5 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Satongar et al. (US PAT # 11,432,095 B1).

Regarding claim 4, Satongar features are already addressed in the above rejection. Satongar does not specifically teach “wherein the audio stream is associated with at least one of a music streaming application, a video streaming application, a gaming application, a virtual reality application, or an augmented reality application” as recited in claim 4.

However, Satonagar teaches a headphone set worn by the user can generate one or more images (e.g., a video stream) with one or more cameras (see col. 4, lines 20-26). 

Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the applicant’s claimed invention to modify the teaching of Satonagar by having the video streaming done by a video streaming application in order to provide cost effective service with more flexibility and mobility to users. 

Regarding claims 5 and 16, Satongar features are already addressed in the above rejection “wherein the optical sensor comprises at least one of a visible range camera or an infrared camera” as recited in claims 5 and 16. 

However, Satonagar teaches a camera can be integrated with the headphone set worn by the user, or a separate camera.  (see col. 2, lines 44-47). Satonagar adda that cameras of the headphone set can include a stereo camera (see col. 4, lines 28-33).

Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the applicant’s claimed invention to modify the teaching of Satonagar by choosing different types of cameras such as visible range camera or an infrared camera based on the need arises or desire. 

Claim(s) 6-8, 14 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Satongar et al. (US PAT # 11,432,095 B1) in view of NINAN et al. (Pub.No.: 2024/0098446 A1).

For claim 6, Satongar features are already addressed in the above rejection. Satongar does not specifically teach “wherein identifying a position of the user’s head comprises identifying a bounding box for the user’s head”.

However, NINAN teaches based on the detected/identified face, image features and/or devices/accessories, the tracking data analyzer (660) can overlay a face mesh onto or with one or more image regions of the tracking image depicting the detected face, image features and/or devices/accessories. The overlaid face mesh may be enclosed within boundaries (e.g., contours, bounding boxes, etc.) of the detected face, image features and/or devices/accessories (see [0135]). 

Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the applicant’s claimed invention to incorporate the feature of detecting face within bounding box, as taught by NINAN into the teaching of Satonagar, in order to provide easier and faster annotation among other benefits that are well-known in the art.     

Regarding claim 7, the combination of Satonagar in view of NINAN teaches wherein identifying the bounding box comprises identifying one or more translational coordinates of the bounding box and one or more angles of rotation of the bounding box (see [0089] of NINAN).

For claims 8 and 17, Satongar features are already addressed in the above rejection. Satongar does not specifically teach “wherein identifying a position of the user’s head comprises identifying locations of one or more facial features”.

However, NINAN teaches the media consumption system, or an audio rendering control system therein, can apply image processing filters or object segmentation/detection operations/algorithms/methods (including but not limited to those based on AI, ML, artificial neural networks, etc.) over image data and detect or recognize image features
representing ears, eyes (e.g., 694-1 and 694-2 of FIG. 6D with an interpupil distance 696, etc.), headphones, etc. The media consumption system, or an audio rendering control system therein, can detect track coordinates of facial parameters (e.g., individual coordinates of mesh vertices/points, individual coordinates of image features representing eyes or ears, individual coordinates of image features representing headphones, etc.) with the face mesh in real time to determine or identify where the user's eyes are, where the user's ears are, where the user's head is in spatial (e.g., positional, orientational, a combination of positional and orientational, etc.) relationship to the camera sensor. The spatial relationship between the user and the camera sensor can be used to adjust or change the rendering of spatial audio (e.g., sound fields, sound images, audio sources depicted in the sound fields/images, etc.) in accordance with spatial location and/or orientation of the user's head in the 3D physical space (or rendering environment) in which the user is contemporaneously located (see [0089]).

Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the applicant’s claimed invention to incorporate the feature of detecting where the user's head is in spatial (e.g., positional, orientational, a combination of positional and orientational, etc.) relationship to the camera sensor, as taught by NINAN, into the teaching of Satongar, in order to provide the user with more realistic and immersive experience.  

For claim 14, Satongar features are already addressed in the above rejection. Satongar does not specifically teach “wherein the one or more reference locations of the user’s head comprise one or more locations of one or more of the user’s ears estimated using the identified position of the user’s head”.

However, NINAN teaches the media consumption system, or an audio rendering control system therein, can apply image processing filters or object segmentation/detection operations/algorithms/methods (including but not limited to those based on AI, ML, artificial neural networks, etc.) over image data and detect or recognize image features
representing ears, eyes (e.g., 694-1 and 694-2 of FIG. 6D with an interpupil distance 696, etc.), headphones, etc. The media consumption system, or an audio rendering control system therein, can detect track coordinates of facial parameters (e.g., individual coordinates of mesh vertices/points, individual coordinates of image features representing eyes or ears, individual coordinates of image features representing headphones, etc.) with the face mesh in real time to determine or identify where the user's eyes are, where the user's ears are, where the user's head is in spatial (e.g., positional, orientational, a combination of positional and orientational, etc.) relationship to the camera sensor. The spatial relationship between the user and the camera sensor can be used to adjust or change the rendering of spatial audio (e.g., sound fields, sound images, audio sources depicted in the sound fields/images, etc.) in accordance with spatial location and/or orientation of the user's head in the 3D physical space (or rendering environment) in which the user is contemporaneously located (see [0089]).

Thus, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the applicant’s claimed invention to incorporate the feature of detecting where the user's head is in spatial (e.g., positional, orientational, a combination of positional and orientational, etc.) relationship to the camera sensor, as taught by NINAN, into the teaching of Satongar, in order to provide the user with more realistic and immersive experience.  

      			    Response to Arguments
4.	Applicant's main argument filed 10/30/2025 have been fully considered but they are not persuasive. 
	Regarding Applicant’s arguments (Pages 6-8 of the Remarks) the Examiner respectfully disagrees with Applicant’s arguments. 
	Applicant argues that Satongar is directed to static placement of virtual speakers and does not teach identifying head position “for each of a plurality of times” and estimating distances associated with the user’s head “for the respective time”. 

	However, Satongar expressly discloses determining and continuously tracking a position of the user’s head in real time and performing head tracking repeatedly during playback (see col. 4, line 1 through col. 5, line 65), which corresponds to identifying head position for each of a plurality of times and at a respective time. 

	Satongar further discloses spatializing audio based on the position of the head and virtual speakers and maintaining or adjusting a distance between the user and a virtual speakers to locations associated with the user’s head and generating corresponding audio signals for the respective time. 

	Note that although the interview summary last conducted indicated that the proposed amendments appeared to overcome the applied prior art based on the limited time available during the interview to review the art, however further review of the amended claim language and the applied references shows that the amended limitations are taught by Satongar, and the 102 rejection is therefore maintained.  	 
 
Conclusion
6.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to Rasha S. AL-Aubaidi whose telephone number is (571) 272-7481.  The examiner can normally be reached on Monday-Friday from 8:30 am to 5:30 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Ahmad Matar, can be reached on (571) 272-7488.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/RASHA S AL AUBAIDI/               Primary Examiner, Art Unit 2693

Read full office action

Prosecution Timeline

Oct 12, 2023

Application Filed

Jul 26, 2025

Non-Final Rejection — §102, §103

Oct 28, 2025

Applicant Interview (Telephonic)

Oct 30, 2025

Response Filed

Nov 04, 2025

Examiner Interview Summary

Feb 16, 2026

Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/397,725

Patent 12593179

System and Method for Efficiency Among Devices

2y 5m to grant Granted Mar 31, 2026

18/105,022

Patent 12581225

CHARGING BOX FOR EARPHONES

2y 5m to grant Granted Mar 17, 2026

18/688,139

Patent 12576367

POLYETHYLENE MEMBRANE ACOUSTIC ASSEMBLY

2y 5m to grant Granted Mar 17, 2026

17/734,011

Patent 12563147

Shared Speakerphone System for Multiple Devices in a Conference Room

2y 5m to grant Granted Feb 24, 2026

18/240,324

Patent 12563330

ELECTRONIC DEVICE

2y 5m to grant Granted Feb 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

78%

Grant Probability

89%

With Interview (+11.1%)

3y 3m

Median Time to Grant

Moderate

PTA Risk

Based on 744 resolved cases by this examiner. Grant probability derived from career allow rate.