Last updated: April 19, 2026

Application No. 18/599,398

FIRST-PERSON AUDIO-VISUAL OBJECT LOCALIZATION SYSTEMS AND METHODS

Non-Final OA §101§102

Filed

Mar 08, 2024

Examiner

HUNTSINGER, PETER K

Art Unit

2682

Tech Center

2600 — Communications

Assignee

UNIVERSITY OF ROCHESTER

OA Round

1 (Non-Final)

This examiner grants 28% of cases after interview

— +16.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 322 resolved cases, 2023–2026

Examiner Intelligence

HUNTSINGER, PETER K View full profile →

Grants only 28% of cases

Career Allow Rate

90 granted / 322 resolved

-34.0% vs TC avg

Strong +17% interview lift

Without

With

+16.7%

Interview Lift

resolved cases with interview

Typical timeline

4y 11m

Avg Prosecution

59 currently pending

Career history

381

Total Applications

across all art units

Statute-Specific Performance

§101

9.3%

-30.7% vs TC avg

§103

50.3%

+10.3% vs TC avg

§102

19.4%

-20.6% vs TC avg

§112

19.0%

-21.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 322 resolved cases

Office Action

§101 §102

DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-7 are rejected under 35 U.S.C. 101 because the claims are drawn to functional descriptive material not claimed as residing on a computer readable medium. Claims 1-7, while reciting a system, do not include any structural limitations. The system includes components that can be encompassed entirely by software without a recitation to any hardware component (See Applicant's specification at paragraph 50 and claim 15). Therefore claims 1-7 are non-statutory. See MPEP 2106.03(I)(“Products that do not have a physical or tangible form, such as information (often referred to as "data per se") or a computer program per se (often referred to as "software per se") when claimed as a product without any structural recitations.”)

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Garg et al. Geometry-aware multi-task learning for binaural audio generation from video arXiv:2111.10882v1 (hereafter “Garg”).
	Referring to claims 1, 8 and 15, Garg discloses a localization system, comprising: 
an image input that receives images from a video source (page 4, The network takes the visual frames and monaural audio as input); 
an audio input that receives, from the video source, audio synchronized with the images (page 4, The network takes the visual frames and monaural audio as input); and 
an audio feature disentanglement network that correlates distinct audio elements from the audio input with corresponding visual features from the image input (page 5, In this way, the visual features are forced to reason about the relative positions of the sound sources and learn to find the cues in the visual frames which dictate the direction of sound heard).
Referring to claims 2, 9 and 16, Garg discloses wherein the images received from the video source comprise first-person videos (pages 1-2, Videos or other media with binaural audio imitate that rich audio experience for a user, making the media feel more real and immersive. This immersion is important for virtual reality and augmented reality applications, where the user should feel transported to another place and perceive it as such).
Referring to claims 3, 10 and 17, Garg discloses a geometry-based feature aggregation module that estimates a geometric transformation between two or more images from the video source and aggregates visual features based on that geometric transformation (page 6, Since the videos are continuous samples over time rather than individual frames, our fourth and final loss regularizes the visual features by requiring them to have spatio-temporal geometric consistency).
Referring to claims 4, 11 and 18, Garg discloses a sounding object estimation engine that correlates the distinct audio elements with object locations of the visual features from the image input (page 10, Figure 6 shows the qualitative visualization of the activation maps for the visual network that provides the object/region producing the sound and its location).
Referring to claims 5, 12 and 19, Garg discloses wherein the visual features are determined based on the geometric transformation (page 5, In particular, we incorporate a classifier to identify whether the visual input is aligned with the audio. The classifier G combines the binaural audio ALR = [At
L, At R] and the visual features vtf to classify if the audio and visuals agree. In this way, the visual features are forced to reason about the relative positions of the sound sources and learn to find the cues in the visual frames which dictate the direction of sound heard).
Referring to claims 6, 13 and 20, Garg discloses wherein the audio feature disentanglement network comprises at least one convolution layer (page 17, The classifier combines the audio and visual
features and uses a fully connected layer for prediction).
Referring to claims 7 and 14, Garg discloses an augmented reality module that plays the distinct audio elements from the audio input in conjunction with displaying the corresponding visual features in an augmented reality environment (page 5, Using the publicly available SoundSpaces2 audio simulations together with the Habitat simulator, we create realistic videos with binaural sounds for publicly available 3D environments in Matterport3D. To construct the dataset, we insert diverse 3D models from poly.google.com of various instruments like guitar, violin, flute etc. and other sound-making objects like phones and clocks into the scene. To generate realistic binaural sound in the environment as if it is coming from the source location and heard at the camera position, we convolve the appropriate SoundSpaces room impulse response with an anechoic audio waveform (e.g., a guitar playing for an inserted guitar 3D object)).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Lindahl US Patent 11,736,862
Senocak et al. Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications. arXiv:1911.09649v1.
Owens et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features. arXiv:1804.03641v2.
Hu et al. Class-Aware Sounding Objects Localization via Audiovisual Correspondence. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 12, DECEMBER 2022.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER K HUNTSINGER whose telephone number is (571)272-7435. The examiner can normally be reached Monday - Friday 8:30 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Q Tieu can be reached at 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PETER K HUNTSINGER/               Primary Examiner, Art Unit 2682

Read full office action

Prosecution Timeline

Mar 08, 2024

Application Filed

Jan 10, 2026

Non-Final Rejection — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/448,676

Patent 12540884

Determining Fracture Roughness from a Core

2y 5m to grant Granted Feb 03, 2026

17/754,623

Patent 12412381

METHODS AND SYSTEMS FOR CONTROLLING OPERATION OF WIRELINE CABLE SPOOLING EQUIPMENT

2y 5m to grant Granted Sep 09, 2025

17/708,447

Patent 12387360

APPARATUS AND METHOD FOR ESTIMATING UNCERTAINTY OF IMAGE COORDINATE

2y 5m to grant Granted Aug 12, 2025

18/118,799

Patent 12388943

PRINTING SYSTEM USING FLUORESENT AND NON-FLUORESENT INK, PRINTING APPARATUS, IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND CONTROL METHOD THEREOF

2y 5m to grant Granted Aug 12, 2025

17/658,154

Patent 12374081

DIGITAL IMAGE PROCESSING TECHNIQUES USING BOUNDING BOX PRECISION MODELS

2y 5m to grant Granted Jul 29, 2025

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

28%

Grant Probability

45%

With Interview (+16.7%)

4y 11m

Median Time to Grant

Low

PTA Risk

Based on 322 resolved cases by this examiner. Grant probability derived from career allow rate.