Last updated: April 19, 2026
Application No. 18/562,663
Enhancing Audio Content of a Captured Sense

Final Rejection §101§103
Filed
Nov 20, 2023
Examiner
BLANKENAGEL, BRYAN S
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
2 (Final)
Interview Optional

— +35.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 377 resolved cases, 2023–2026
Examiner Intelligence

BLANKENAGEL, BRYAN S View full profile →
Grants 67% — above average
Career Allow Rate
254 granted / 377 resolved
+5.4% vs TC avg
Strong +35% interview lift
Without
With
+35.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
400
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
49.3%
+9.3% vs TC avg
§102
13.3%
-26.7% vs TC avg
§112
6.5%
-33.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 377 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-2 and 4-19 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant's arguments filed 02/11/2026 have been fully considered but they are not persuasive. Regarding arguments on page 9 of the Remarks, Examiner notes that the human mind performs similar tasks to those being claimed, including processing visual and audio data with context. For example, a human while walking is processing visual, audio, and motion data, including balance. Therefore, the claims can still be construed as a mental process, being performed by generic computing components.
Regarding arguments on pages 9-10 of the Remarks, Examiner notes that a human could mentally determine an audio focus point, either as a result of controlling an interface, or before controlling the interface. For example, a user could simply touch a location on the interface, and then focus on the selected location. Therefore, this is still simply using generic computing components to perform the abstract idea.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-2 and 4-19 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  Using the subject matter eligibility test from page 74621 of the Federal Register Notice titled “2014 Interim Guidance on Patent Subject Matter Eligibility,” a two-step process is performed. Under step 1, the claims are analyzed to determine if the claim is directed to a process, machine, article of manufacture, or composition of matter. In this case, claims 1-2, 4-14, and 16-18 are directed to a method, which is a process, while claims 15 and 19 are directed to a device, which is a machine or an article of manufacture. Step 2A (part 1 of the Mayo test), using the guidance from pages 50-57 of the Federal Register Vol. 84 No. 4 from Monday, January 7, 2019, requires applying a two-prong inquiry. In Prong One, examiners evaluate whether the claim recites a judicial exception, determining if the claim is directed to a law of nature, a natural phenomenon, or an abstract idea. In this case, claim 1 recites determining a context, which is a mental process, as well as enhancing audio content, which is a mathematical calculation. In Prong Two, examiners evaluate whether the judicial exception is integrated into a practical application that imposes a meaningful limit on the judicial exception. In this case, limitations of capturing and presenting data are mere extrasolution activity, and do not integrate the abstract ideas into a practical application.
Step 2B (part 2 of the Mayo test) requires analyzing the claims to determine if they recite additional elements that amount to significantly more than the judicial exception. In this case, the claims do not include additional elements that are sufficient to amount to significantly more than the abstract idea itself.  

Regarding claim 1, determining a context is a mental process, and enhancing audio content is a mathematical calculation, both of which are abstract ideas. For example, a user would view and listen to a scene and determine what the context is, while enhancing audio content could involve mathematical calculations of subtracting noise or multiplying the signal by a filter. Additional limitations of capturing, receiving, and presenting data are mere extrasolution activity, and do not integrate the abstract ideas into a practical application or constitute significantly more.

Regarding claim 2, increasing or decreasing magnitude of a sound is a mathematical calculation, which is an abstract idea without integration into a practical application and without significantly more.

Regarding claims 3-10, 14, and 16-19, the limitations are further clarifications of the above abstract ideas.

Regarding claim 11, determining an intent based on past behavior is a mental process, while the enhancing is a mathematical calculation, both of which are abstract ideas without integration into a practical application and without significantly more.

Regarding claims 12 and 15, determining an audio focus point is a mental process, and enhancing audio content is a mathematical calculation, both of which are abstract ideas. For example, a user would view and listen to a scene and determine where the audio is focused, while enhancing audio content could involve mathematical calculations of subtracting noise or multiplying the signal by a filter. Additional limitations of capturing and presenting data are mere extrasolution activity, while processor and storage medium are generic computing components, and do not integrate the abstract ideas into a practical application or constitute significantly more.

Regarding claim 13, beamforming is a series of mathematical calculations, which is an abstract idea without integration into a practical application and without significantly more.

The limitations of the claims, taken alone, do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements individually. Applicable case law cited in the Federal Register includes, but is not limited to: Alice Corp., 134 S. Ct. at 2355-56, Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014), Benson, 409 U.S. at 63.

See "Preliminary Examination Instructions in view of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.," dated June 25, 2014, and the Federal Register notice titled "2014 Interim Guidance on Patent Subject Matter Eligibility" (79 FR 74618).

	
	Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 4-6, and 8-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosvenor et al. (US 2005/0281410 A1), hereinafter referred to as Grosvenor, in view of .

Regarding claim 1, Grosvenor teaches:
A method performed by an electronic device, the method comprising: 
capturing, by the electronic device, a scene, the capturing of the scene including capturing image content with an image sensor and audio content with an audio sensor (para [0068], where received data comprises audio-visual data having still image or video content as well as time varying audio data, and para [0060], [0091], where camera and microphones are used); 
determining, by the electronic device, a context associated with the capturing of the scene (para [0096], where the location or sounds of the couple in the scene is the context); 
enhancing, by the electronic device, the audio content based at least in part on the determined context (Fig. 3a element 311, para [0096], where a virtual microphone is generated to follow the couple, producing modified audio data with other sound sources removed); and 
presenting, by the electronic device, the image content and the enhanced audio content (para [0096], where the modified audio is played in conjunction with the video recording).  
Grosvenor does not teach:
receiving, by the electronic device, contextual information from at least one context sensor other than the image sensor and the audio sensor;
determining, by the electronic device, a context associated with the capturing of the scene, wherein the context is determined by a context-analyzer module configured to combine inputs from an audio-analyzer module, an image-analyzer module, and the contextual information from the at least one context sensor;
Leppanen teaches:
receiving, by the electronic device, contextual information from at least one context sensor other than the image sensor and the audio sensor (para [0051], where a gyroscope or accelerometer is used to determine orientation);
determining, by the electronic device, a context associated with the capturing of the scene, wherein the context is determined by a context-analyzer module configured to combine inputs from an audio-analyzer module, an image-analyzer module, and the contextual information from the at least one context sensor (Fig. 10, para [0064-65], where the captured image data and audio data and orientation are used to generate modified spatial audio data which includes the context of the orientation and changes in orientation during the data capture);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor by using the contextual information of Leppanen (Leppanen para [0051]) in determining the context of Grosvenor (Grosvenor para [0096]), in order to generate modified spatial audio data to compensate for changes in orientation during the spatial audio data capture (Leppanen para [0053]).

Regarding claim 2, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein enhancing the audio content includes increasing or decreasing a magnitude of at least one sound included in the audio content (Grosvenor para [0096], where other sound sources are removed or reduced in intensity).  

Regarding claim 4, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein the contextual information detected by the one or more sensors of the electronic device includes: 
information indicative of a location of the electronic device (Grosvenor para [0096], where the virtual microphone is generated based on the location and movement of the users); or 
information indicative of a motion of the electronic device (Grosvenor para [0096], where the virtual microphone is generated based on the location and movement of the users).

Regarding claim 5, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein determining the context associated with the capturing of the scene includes determining the context based, at least in part, on an analysis of the image content by the electronic device (Grosvenor para [0068], [0071], where the image data is analyzed to identify particular characteristics of the audio data).  

Regarding claim 6, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein determining the context associated with the capturing of the scene includes determining the context based, at least in part, on an analysis of the audio content by the electronic device (Grosvenor para [0068], [0070], where the audio data is analyzed to identify characteristic sounds associated with the sound sources).  

Regarding claim 8, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein presenting the image content and the enhanced audio content includes presenting a recording of the image content and a post-processed, enhanced recording of the audio content (Grosvenor Fig. 2 elements 206-207, para [0073-74], [0096], where the modified audio is played in conjunction with the video recording, the modified audio including selection of the sound sources, considered enhancing and post-processing).

Regarding claim 9, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein the image content includes video content (Grosvenor para [0096], where the images are in a video recording).  

Regarding claim 10, Grosvenor in view of Leppanen teaches:
The method of claim 1, wherein the image content includes still image content (Grosvenor para [0060], [0068], where still images of a scene are used).  

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosvenor, in view of Leppanen, and further in view of Zheng et al. (US 2021/0217432 A1), hereinafter referred to as Zheng.

Regarding claim 7, Grosvenor in view of Leppanen teaches:
The method of claim 1
Grosvenor in view of Leppanen does not teach:
wherein presenting the image content and the enhanced audio content includes presenting the image content and the enhanced audio content in real-time.
Zheng teaches:
wherein presenting the image content and the enhanced audio content includes presenting the image content and the enhanced audio content in real-time (para [0024], where the audio signal is enhanced in real time while playing back the video and audio).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor in view of Leppanen by using the real time adjustments of Zheng (Zheng para [0024]) in the scene of Grosvenor in view of Leppanen (Grosvenor para [0096]), in order to have enhanced audio when playing back captured video and audio signals (Zheng para [0024]).

Claim(s) 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosvenor, in view of Leppanen, and further in view of Avrahami et al. (US 2019/0213465 A1), hereinafter referred to as Avrahami.

Regarding claim 11, Grosvenor in view of Leppanen teaches:
The method of claim 1, further comprising:
Grosvenor in view of Leppanen does not teach:
determining an intent of a user directing the electronic device to capture the scene, the determining based, at least in part, on a machine-learned model referencing a past behavior of the user; and
wherein enhancing the audio content is further based on the determined intent.  
Avrahami teaches:
determining an intent of a user directing the electronic device to capture the scene, the determining based, at least in part, on a machine-learned model referencing a past behavior of the user (para [0019], [0028], where a user's previous responses and behaviors are learned by the agent, the patterns interpreted as intent); and
wherein enhancing the audio content is further based on the determined intent (para [0028], where the audio is adjusted based on the learned patterns).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor in view of Leppanen by using the environmental adjustments of Avrahami (Avrahami para [0028]) in the scene of Grosvenor in view of Leppanen (Grosvenor para [0096]), in order to reduce environmental constraints and improve the environmental settings (Avrahami para [0028])

Claim(s) 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosvenor, in view of Leppanen, and further in view of Virolainen et al. (US 2022/0060824 A1), hereinafter referred to as Virolainen.

Regarding claim 16, Grosvenor in view of Leppanen teaches:
The method of claim 1, 
Grosvenor in view of Leppanen does not teach:
wherein the context is determined using at least one machine-learned model.
Virolainen teaches:
wherein the context is determined using at least one machine-learned model (para [0122], where a machine learning network is used to control audio focus).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor in view of Leppanen by using the interface of Virolainen (Virolainen Fig. 6) to allow control of the audio focus of Grosvenor in view of Leppanen (Grosvenor para [0049]), to allow the user to modify audio characteristics of the signal and to reduce wind noise (Virolainen para [0124]).

Regarding claim 17, Grosvenor in view of Leppanen teaches:
The method of claim 1, 
Grosvenor in view of Leppanen does not teach:
wherein the context-analyzer module includes at least one machine-learned model.
Virolainen teaches:
wherein the context-analyzer module includes at least one machine-learned model (para [0122], where a machine learning network is used to control audio focus).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor in view of Leppanen by using the interface of Virolainen (Virolainen Fig. 6) to allow control of the audio focus of Grosvenor in view of Leppanen (Grosvenor para [0049]), to allow the user to modify audio characteristics of the signal and to reduce wind noise (Virolainen para [0124]).

Claim(s) 12-15, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Grosvenor, in view of Virolainen.

Regarding claim 12, Grosvenor teaches:
A method performed by an electronic device, the method comprising: 
capturing, by the electronic device, a scene, the capture of the scene including capturing image content and audio content (para [0068], where received data comprises audio-visual data having still image or video content as well as time varying audio data); 
determining, by the electronic device, an audio focus point within the scene (para [0049], where a virtual microphone is used to focus on a conversation); 
enhancing, by the electronic device, the audio content based at least in part on the determined audio focus point (para [0049], [0096], where the virtual microphone is used to focus on one point and remove or reduce intensity of other sound sources); and
presenting, by the electronic device, the image content and the enhanced audio content (para [0096], where the modified audio is played in conjunction with the video recording).  
Grosvenor does not teach:
determining, by the electronic device based on user interaction with an audio focus point control on a graphical user interface displaying the scene on the electronic device, an audio focus point within the scene;
Virolainen teaches:
determining, by the electronic device based on user interaction with an audio focus point control on a graphical user interface displaying the scene on the electronic device, an audio focus point within the scene (Fig. 6, para [0124], where a user interface enables user input for controlling application of audio focusing);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor by using the interface of Virolainen (Virolainen Fig. 6) to allow control of the audio focus of Grosvenor (Grosvenor para [0049]), to allow the user to modify audio characteristics of the signal and to reduce wind noise (Virolainen para [0124]).

Regarding claim 13, Grosvenor in view of Virolainen teaches:
The method of claim 12 wherein enhancing the audio content includes using beamforming during the capturing of the audio content, the beamforming based at least in part on the determined audio focus point (Virolainen para [0100-102], where adaptive beamforming is used in the focus processing). 

Regarding claim 14, Grosvenor in view of Virolainen teaches:
The method of claim 12,wherein the determined audio focus point is based, at least in part, on: 
an input from a user of the electronic device (Grosvenor para [0094], where a user action changes the focus); 
a context associated with the capturing of the scene (Grosvenor para [0049], [0059], where the focus point is determined based on the context); or 
an analysis of the image content (Grosvenor para [0071], where the image data is analyzed to identify particular characteristics of the audio data).  

Regarding claim 15, Grosvenor teaches:
An electronic device comprising: 
a processor (Fig. 1 element 102, para [0068], where a processor is used); and 
a computer-readable storage medium comprising instructions of a content-enhancement manager module that, when executed by the processor, directs the electronic device to (Fig. 1 element 103, para [0068-69], where memory stores the application program): 
capture a scene, the capture of the scene including capturing image content and audio content (para [0068], where received data comprises audio-visual data having still image or video content as well as time varying audio data); 
determine an audio focus point within the scene based on user interaction with an audio focus point control on a graphical user interface displaying the scene (para [0049], where a virtual microphone is used to focus on a conversation); 
enhance the audio content based at least in part on the determined audio focus point (para [0049], [0096], where the virtual microphone is used to focus on one point and remove or reduce intensity of other sound sources); and 
present the image content and the enhanced audio content (para [0096], where the modified audio is played in conjunction with the video recording).
Grosvenor does not teach:
determine an audio focus point within the scene based on user interaction with an audio focus point control on a graphical user interface displaying the scene;
Virolainen teaches:
determine an audio focus point within the scene based on user interaction with an audio focus point control on a graphical user interface displaying the scene (Fig. 6, para [0124], where a user interface enables user input for controlling application of audio focusing);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Grosvenor by using the interface of Virolainen (Virolainen Fig. 6) to allow control of the audio focus of Grosvenor (Grosvenor para [0049]), to allow the user to modify audio characteristics of the signal and to reduce wind noise (Virolainen para [0124]).

Regarding claim 18, Grosvenor in view of Virolainen teaches:
The method of claim 12, wherein determining the audio focus point is further based on at least one machine-learned model (Virolainen para [0122], where a machine learning network is used to control audio focus).

Regarding claim 19, Grosvenor in view of Virolainen teaches:
The electronic device of claim 15, wherein the content-enhancement manager module includes at least one machine-learned model (Virolainen para [0122], where a machine learning network is used to control audio focus).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2022/0260664 A1 para [0057], where a user selects an audio focus direction of interest via a user interface; US 2022/0248158 A1 para [0029] teaches an interface providing a control element to control an audio focus parameter; US 2022/0328056 A1 para [0110] teaches allowing user control of focus direction and amount in a sound field.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Nov 20, 2023
Application Filed
Oct 14, 2025
Non-Final Rejection — §101, §103
Feb 11, 2026
Response Filed
Mar 12, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/401,768
Patent 12602551
GENERATION OF SYNTHETIC DOCUMENTS FOR DATA AUGMENTATION
2y 5m to grant Granted Apr 14, 2026
17/850,617
Patent 12579993
Multi-Talker Audio Stream Separation, Transcription and Diaraization
2y 5m to grant Granted Mar 17, 2026
18/014,217
Patent 12572759
MULTILINGUAL CONVERSATION TOOL
2y 5m to grant Granted Mar 10, 2026
18/251,876
Patent 12555591
MACHINE LEARNING ASSISTED SPATIAL NOISE ESTIMATION AND SUPPRESSION
2y 5m to grant Granted Feb 17, 2026
18/066,128
Patent 12547836
KNOWLEDGE FACT RETRIEVAL THROUGH NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+35.2%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 377 resolved cases by this examiner. Grant probability derived from career allow rate.