Prosecution Insights
Last updated: April 19, 2026
Application No. 18/599,398

FIRST-PERSON AUDIO-VISUAL OBJECT LOCALIZATION SYSTEMS AND METHODS

Non-Final OA §101§102
Filed
Mar 08, 2024
Examiner
HUNTSINGER, PETER K
Art Unit
2682
Tech Center
2600 — Communications
Assignee
UNIVERSITY OF ROCHESTER
OA Round
1 (Non-Final)
28%
Grant Probability
At Risk
1-2
OA Rounds
4y 11m
To Grant
45%
With Interview

Examiner Intelligence

Grants only 28% of cases
28%
Career Allow Rate
90 granted / 322 resolved
-34.0% vs TC avg
Strong +17% interview lift
Without
With
+16.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 11m
Avg Prosecution
59 currently pending
Career history
381
Total Applications
across all art units

Statute-Specific Performance

§101
9.3%
-30.7% vs TC avg
§103
50.3%
+10.3% vs TC avg
§102
19.4%
-20.6% vs TC avg
§112
19.0%
-21.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 322 resolved cases

Office Action

§101 §102
DETAILED ACTION The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-7 are rejected under 35 U.S.C. 101 because the claims are drawn to functional descriptive material not claimed as residing on a computer readable medium. Claims 1-7, while reciting a system, do not include any structural limitations. The system includes components that can be encompassed entirely by software without a recitation to any hardware component (See Applicant's specification at paragraph 50 and claim 15). Therefore claims 1-7 are non-statutory. See MPEP 2106.03(I)(“Products that do not have a physical or tangible form, such as information (often referred to as "data per se") or a computer program per se (often referred to as "software per se") when claimed as a product without any structural recitations.”) Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention. Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Garg et al. Geometry-aware multi-task learning for binaural audio generation from video arXiv:2111.10882v1 (hereafter “Garg”). Referring to claims 1, 8 and 15, Garg discloses a localization system, comprising: an image input that receives images from a video source (page 4, The network takes the visual frames and monaural audio as input); an audio input that receives, from the video source, audio synchronized with the images (page 4, The network takes the visual frames and monaural audio as input); and an audio feature disentanglement network that correlates distinct audio elements from the audio input with corresponding visual features from the image input (page 5, In this way, the visual features are forced to reason about the relative positions of the sound sources and learn to find the cues in the visual frames which dictate the direction of sound heard). Referring to claims 2, 9 and 16, Garg discloses wherein the images received from the video source comprise first-person videos (pages 1-2, Videos or other media with binaural audio imitate that rich audio experience for a user, making the media feel more real and immersive. This immersion is important for virtual reality and augmented reality applications, where the user should feel transported to another place and perceive it as such). Referring to claims 3, 10 and 17, Garg discloses a geometry-based feature aggregation module that estimates a geometric transformation between two or more images from the video source and aggregates visual features based on that geometric transformation (page 6, Since the videos are continuous samples over time rather than individual frames, our fourth and final loss regularizes the visual features by requiring them to have spatio-temporal geometric consistency). Referring to claims 4, 11 and 18, Garg discloses a sounding object estimation engine that correlates the distinct audio elements with object locations of the visual features from the image input (page 10, Figure 6 shows the qualitative visualization of the activation maps for the visual network that provides the object/region producing the sound and its location). Referring to claims 5, 12 and 19, Garg discloses wherein the visual features are determined based on the geometric transformation (page 5, In particular, we incorporate a classifier to identify whether the visual input is aligned with the audio. The classifier G combines the binaural audio ALR = [At L, At R] and the visual features vtf to classify if the audio and visuals agree. In this way, the visual features are forced to reason about the relative positions of the sound sources and learn to find the cues in the visual frames which dictate the direction of sound heard). Referring to claims 6, 13 and 20, Garg discloses wherein the audio feature disentanglement network comprises at least one convolution layer (page 17, The classifier combines the audio and visual features and uses a fully connected layer for prediction). Referring to claims 7 and 14, Garg discloses an augmented reality module that plays the distinct audio elements from the audio input in conjunction with displaying the corresponding visual features in an augmented reality environment (page 5, Using the publicly available SoundSpaces2 audio simulations together with the Habitat simulator, we create realistic videos with binaural sounds for publicly available 3D environments in Matterport3D. To construct the dataset, we insert diverse 3D models from poly.google.com of various instruments like guitar, violin, flute etc. and other sound-making objects like phones and clocks into the scene. To generate realistic binaural sound in the environment as if it is coming from the source location and heard at the camera position, we convolve the appropriate SoundSpaces room impulse response with an anechoic audio waveform (e.g., a guitar playing for an inserted guitar 3D object)). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Lindahl US Patent 11,736,862 Senocak et al. Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications. arXiv:1911.09649v1. Owens et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features. arXiv:1804.03641v2. Hu et al. Class-Aware Sounding Objects Localization via Audiovisual Correspondence. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 44, NO. 12, DECEMBER 2022. Any inquiry concerning this communication or earlier communications from the examiner should be directed to PETER K HUNTSINGER whose telephone number is (571)272-7435. The examiner can normally be reached Monday - Friday 8:30 - 5:00. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Benny Q Tieu can be reached at 571-272-7490. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /PETER K HUNTSINGER/ Primary Examiner, Art Unit 2682
Read full office action

Prosecution Timeline

Mar 08, 2024
Application Filed
Jan 10, 2026
Non-Final Rejection — §101, §102 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12540884
Determining Fracture Roughness from a Core
2y 5m to grant Granted Feb 03, 2026
Patent 12412381
METHODS AND SYSTEMS FOR CONTROLLING OPERATION OF WIRELINE CABLE SPOOLING EQUIPMENT
2y 5m to grant Granted Sep 09, 2025
Patent 12387360
APPARATUS AND METHOD FOR ESTIMATING UNCERTAINTY OF IMAGE COORDINATE
2y 5m to grant Granted Aug 12, 2025
Patent 12388943
PRINTING SYSTEM USING FLUORESENT AND NON-FLUORESENT INK, PRINTING APPARATUS, IMAGE PROCESSING APPARATUS, IMAGE PROCESSING METHOD, AND CONTROL METHOD THEREOF
2y 5m to grant Granted Aug 12, 2025
Patent 12374081
DIGITAL IMAGE PROCESSING TECHNIQUES USING BOUNDING BOX PRECISION MODELS
2y 5m to grant Granted Jul 29, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
28%
Grant Probability
45%
With Interview (+16.7%)
4y 11m
Median Time to Grant
Low
PTA Risk
Based on 322 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month