Last updated: May 29, 2026
Application No. 18/763,719
AUDIO ENHANCEMENT AND OPTIMIZATION OF AN IMMERSIVE AUDIO EXPERIENCE

Non-Final OA §101§103
Filed
Jul 03, 2024
Priority
Jul 07, 2023 — provisional 63/512,512
Examiner
COLUCCI, MICHAEL C
Art Unit
2655
Tech Center
2600 — Communications
Assignee
Shure Acquisition Holdings Inc.
OA Round
1 (Non-Final)
Interview Optional

— +15.2% interview lift. Examiner has a relatively high allowance rate (76%); +15.2% interview lift. A written response may suffice.
Based on 999 resolved cases, 2023–2026
Examiner Intelligence

COLUCCI, MICHAEL C View full profile →
Grants 76% — above average
Career Allowance Rate
758 granted / 999 resolved
+13.9% vs TC avg
Strong +15% interview lift
Without
With
+15.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
32 currently pending
Career history
1033
Total Applications
across all art units
Statute-Specific Performance

§101
3.6%
-36.4% vs TC avg
§103
86.9%
+46.9% vs TC avg
§102
2.9%
-37.1% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 999 resolved cases
Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 20 is rejected under 35 U.S.C. 101 because:
The claimed invention is directed to non-statutory subject matter.  
As per the claims, the language “computer readable medium”
do not place the claimed subject matter into statutory form.  The specification fails to indicate any specific type of medium such as ROM, RAM, hard disk, or non-transitory, non-carrier or non-propagating forms of storage, and therefore the claimed subject matter necessarily includes non-statutory embodiments, for instance transitory media, propagating signal, or signal.  Examiner suggests amending the claims to include non-transitory. Note that it is not sufficient to add "tangible" or “physical”, see In re Nuijten, 500 F.3d 1346,1356-57 (Fed. Cir. 2007) for instance.

As a result of said “computer readable medium” not having statutory support, the claim is also rejected as follows:
In reference to 2106.03: 
“...examples of claims that are not directed to any of the statutory categories include: ...
Products that do not have a physical or tangible form, such as information (often referred to as “data per se”) or a computer program per se (often referred to as “software per se”) when claimed as a product without any structural recitations,” 
“As the courts’ definitions of machines, manufactures and compositions of matter indicate, a product must have a physical or tangible form in order to fall within one of these statutory categories. Digitech, 758 F.3d at 1348, 111 USPQ2d at 1719.”


	

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 9-14, 16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20200081682 A1 Vestal; Christopher Daniel (hereinafter Vestal) in view of US 12272369 B1 Chhetri; Amit Singh et al. (hereinafter Chhetri).
Re claim 1, Vestal teaches 
1. An apparatus comprising at least one processor and a memory storing instructions that are operable, when executed by the at least one processor, to cause the apparatus to: (fig. 2a-2b)
generate an audio feature set for a transduced audio stream captured via at least one capture device positioned within an environment defining at least one audio capture area; (using audio stream of live sounds as in fig. 3, features such as packetized audio segments as features per se under BRI 0002 0007, an environment e.g. stadium 0031 or commentators broadcasting 0069)
receive, from a user device, one or more user audio isolation control parameters; (user inputs parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)
...and (ii) one or more user audio isolation control parameters; and (generating isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)
generate output data for an output device based at least in part on the isolated audio (creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

However, while Vestal teaches a multi-source audio isolation and adjustment system for a stadium and voice environment where a user can adjust parameters, it does not suggest masking per se and neural network concepts, thus failing to teach:
input the audio feature set to a neural network model configured to generate an audio isolation mask associated with the transduced audio stream; (Chhetri using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
generate isolated audio for the transduced audio stream based at least in part on (i) the audio isolation mask … (Chhetri masked isolated outputs based on speech mask e.g. elements 360 and 450 NN with mixing isolations at 635 and isolated output 335, NOTE reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal to incorporate the above claim limitations as taught by Chhetri to allow for combining prior art elements according to known methods to yield predictable results such as to use DNNs to distinguish between desirable voices and background noises (like stadiums, barking dogs or sirens), allowing for targeted, high-fidelity audio enhancement that adapts to individual user preferences and environmental contexts improving Vestal to utilize a DNN with spectral processing to analogously reduce noise but self-sufficiently learn to optimize functions in real-time noise suppression, handling non-stationary noise, and adapting to new, unseen sounds.

Re claim 19, this claim has been rejected for teaching a broader, or narrower claim based on general inclusion of hardware alone (e.g. processor, memory, instructions), representation of claim 1 omitting/including hardware for instance, otherwise amounting to a virtually identical scope


	Re claim 20, this claim has been rejected for teaching a broader, or narrower claim based on general inclusion of hardware alone (e.g. processor, memory, instructions), representation of claim 1 omitting/including hardware for instance, otherwise amounting to a virtually identical scope
For instance, see fig. 2a and 2b which contains the hardware.
	

Re claim 2, Vestal teaches
2. The apparatus of claim 1, wherein the environment is an arena environment.  (creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

Re claim 3, Vestal teaches
3. The apparatus of claim 2, wherein the arena environment defines a playing region, a spectator region, and a noise source region, and wherein the instructions are further operable to cause the apparatus to: generate the isolated audio for the playing region, the spectator region, or the noise source region based at least in part on… (ii) the one or more user audio isolation control parameters.  (creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)
However, while Vestal teaches a multi-source audio isolation and adjustment system for a stadium and voice environment where a user can adjust parameters, it does not suggest masking per se and neural network concepts, thus failing to teach:
(i) the audio isolation mask and (Chhetri masked isolated outputs based on speech mask e.g. elements 360 and 450 NN with mixing isolations at 635 and isolated output 335, NOTE reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal to incorporate the above claim limitations as taught by Chhetri to allow for combining prior art elements according to known methods to yield predictable results such as to use DNNs to distinguish between desirable voices and background noises (like stadiums, barking dogs or sirens), allowing for targeted, high-fidelity audio enhancement that adapts to individual user preferences and environmental contexts improving Vestal to utilize a DNN with spectral processing to analogously reduce noise but self-sufficiently learn to optimize functions in real-time noise suppression, handling non-stationary noise, and adapting to new, unseen sounds.

Re claim 4, Vestal teaches 
4. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: receive the transduced audio stream from an audio mixer device.  (isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

Re claim 5, Vestal teaches 
5. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: generate mixed isolated audio based at least in part on the isolated audio and different isolated audio associated with the environment; and (multiple signals mixed as input, creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)
generate the output data for the output device based at least in part on the mixed isolated audio.  (multiple signals mixed as input, output based on parameters audibly for user to hear isolation, creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)


Re claim 9, while Vestal teaches a multi-source audio isolation and adjustment system for a stadium and voice environment where a user can adjust parameters, it does not suggest masking per se and neural network concepts, thus failing to teach:
9. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: generate a reference audio feature set for a reference microphone signal associated with the environment; and (Chhetri reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
input the audio feature set and the reference audio feature set to the neural network model to generate the audio isolation mask.  (Chhetri masked isolated outputs based on speech mask e.g. elements 360 and 450 NN with mixing isolations at 635 and isolated output 335, NOTE reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal to incorporate the above claim limitations as taught by Chhetri to allow for combining prior art elements according to known methods to yield predictable results such as to use DNNs to distinguish between desirable voices and background noises (like stadiums, barking dogs or sirens), allowing for targeted, high-fidelity audio enhancement that adapts to individual user preferences and environmental contexts improving Vestal to utilize a DNN with spectral processing to analogously reduce noise but self-sufficiently learn to optimize functions in real-time noise suppression, handling non-stationary noise, and adapting to new, unseen sounds.

Re claim 10, Vestal teaches
10. The apparatus of claim 1, wherein the transduced audio stream comprises at least one microphone signal from a group comprising a first microphone signal and one or more microphone signals associated with one or more sounds in the environment.  (multiple sources in a stadium, based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

Re claim 11, while Vestal teaches a multi-source audio isolation and adjustment system for a stadium and voice environment where a user can adjust parameters, it does not suggest masking per se and neural network concepts, thus failing to teach:
11. The apparatus of claim 1, wherein the audio isolation mask comprises a denoiser mask, a speech removal mask, or a signal of interest mask.  (Chhetri masked isolated outputs based on speech mask e.g. elements 360 and 450 NN with mixing isolations at 635 and isolated output 335, NOTE reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal to incorporate the above claim limitations as taught by Chhetri to allow for combining prior art elements according to known methods to yield predictable results such as to use DNNs to distinguish between desirable voices and background noises (like stadiums, barking dogs or sirens), allowing for targeted, high-fidelity audio enhancement that adapts to individual user preferences and environmental contexts improving Vestal to utilize a DNN with spectral processing to analogously reduce noise but self-sufficiently learn to optimize functions in real-time noise suppression, handling non-stationary noise, and adapting to new, unseen sounds.

Re claim 12, Vestal teaches
12. The apparatus of claim 1, wherein the output data comprises broadcast audio.  (e.g. commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

Re claim 13, Vestal teaches
13. The apparatus of claim 1, wherein the output data comprises speech reinforcement audio.  (altering the levels or reinforcing select sources, creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

Re claim 14, Vestal teaches
14. The apparatus of claim 1, wherein the output data comprises visual data configured to render via a display of the output device.  (fig. 2a-2b display output to user with user selection and also a speaker to play isolated audio based on user settings)

Re claim 16, Vestal teaches
16. The apparatus of claim 1, wherein the output data comprises a video stream associated with the isolated audio.  (0020 the system can handle both video or audio sources)


Re claim 18, while Vestal teaches a multi-source audio isolation and adjustment system for a stadium and voice environment where a user can adjust parameters, it does not suggest masking per se and neural network concepts, thus failing to teach:
18. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: initiate selection of an audio channel associated with desirable audio based at least in part on the audio isolation mask. (Chhetri selection of a signal in multiple channels col 12 lines 37-61 with col 7 line 52 to col 8 line 49… masked isolated outputs based on speech mask e.g. elements 360 and 450 NN with mixing isolations at 635 and isolated output 335, NOTE reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal to incorporate the above claim limitations as taught by Chhetri to allow for combining prior art elements according to known methods to yield predictable results such as to use analogous multiple channels and data sources but with DNNs to distinguish between desirable voices and background noises (like stadiums, barking dogs or sirens), allowing for targeted, high-fidelity audio enhancement that adapts to individual user preferences and environmental contexts improving Vestal to utilize a DNN with spectral processing to analogously reduce noise but self-sufficiently learn to optimize functions in real-time noise suppression, handling non-stationary noise, and adapting to new, unseen sounds.

Claims 6-8 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20200081682 A1 Vestal; Christopher Daniel (hereinafter Vestal) in view of US 12272369 B1 Chhetri; Amit Singh et al. (hereinafter Chhetri) and further in view of US 20230421702 A1 CUTLER; Ross (hereinafter CUTLER).
Re claim 6, 
6. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: 
receive a first audio channel stream via a first capture device positioned within a first audio capture area of the environment; (creating an isolated output using mixed sound sources as in fig. 1, based on input e.g. stadium, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)
receive a second audio channel stream via a second capture device positioned within a second audio capture area of the environment; (identical operation on e.g. commentary, for another source microphone e.g. element 124N in fig. 1… creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)
generate a first audio feature set for the first audio channel stream; (using audio stream e.g. stadium, of live sounds as in fig. 3, features such as packetized audio segments as features per se under BRI 0002 0007, an environment e.g. stadium 0031 or commentators broadcasting 0069)
generate a second audio feature set for the second audio channel stream; (identical operations in a different stream that is present e.g. commentary, using audio stream of live sounds as in fig. 3, features such as packetized audio segments as features per se under BRI 0002 0007, an environment e.g. stadium 0031 or commentators broadcasting 0069)
However, while Vestal teaches a multi-source audio isolation and adjustment system for a stadium and voice environment where a user can adjust parameters, it does not suggest masking per se and neural network concepts, thus failing to teach:
input the first audio feature set to a first neural network model to generate a first mixing control signal; (Chhetri masked isolated outputs based on speech mask e.g. elements 360 and 450 NN with mixing isolations at 635 and isolated output 335, NOTE reference signal input as well at 304, using a neural network to process spectral data as a feature set per for an audio input col 17 line 41 to col 18 line 16 with fig. 2a-2b features from time data and tone using a neural network using a DNN col 20 line 32 to col 21 line 6 with fig. 3b and 6b + col 1 line 54 to col 2 line 5 + abstract supporting DNN with isolation for audio input and features e.g. tone, using noise removal with masking col 10 lines 31-50 & col 16 line 58 to col 17 line 5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal to incorporate the above claim limitations as taught by Chhetri to allow for combining prior art elements according to known methods to yield predictable results such as to use DNNs to distinguish between desirable voices and background noises (like stadiums, barking dogs or sirens), allowing for targeted, high-fidelity audio enhancement that adapts to individual user preferences and environmental contexts improving Vestal to utilize a DNN with spectral processing to analogously reduce noise but self-sufficiently learn to optimize functions in real-time noise suppression, handling non-stationary noise, and adapting to new, unseen sounds.

However, while the combination teaches identical operations applied to multiple channels or audio sources, a neural network, masking, and audio source isolation, it fails to teach a parallel neural network:
input the second audio feature set to a second neural network model to generate a second mixing control signal; and (CUTLER dual microphone system but with dedicated neural network per microphone or source audio, operating in parallel mixed together 0106 and 0111, also performing frequency attenuation with masking for noise reduction 0057, and transforms per frame within a period of time 0062-0063 in an iterative embedding element e.g. 610 and fig. 6 in real time for audio e.g. 0092)
select the transduced audio stream from a plurality of transduced audio streams based at least in part on the first mixing control signal and the second mixing control signal.  (CUTLER dual microphone system but with dedicated neural network per microphone or source audio, operating in parallel mixed together 0106 and 0111, also performing frequency attenuation with masking for noise reduction 0057, and transforms per frame within a period of time 0062-0063 in an iterative embedding element e.g. 610 and fig. 6 in real time for audio e.g. 0092)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal in view of Chhetri to incorporate the above claim limitations as taught by CUTLER to allow for combining prior art elements according to known methods to yield predictable results such as to create optimal source-specific modeling, leading to cleaner separation, better spatial handling, and reduced crosstalk, which thereby analogously aligns with Vestal in view of Chhetri and enhances flexibility by improving distinct temporal and spectral features of instruments or voices, while allowing for individualized processing of stereo/multi-channel gain and phase differences.

Re claim 7, Vestal teaches
7. The apparatus of claim 6, wherein the instructions are further operable to cause the apparatus to: select the transduced audio stream via an audio mixer device.  (creating an isolated output using mixed sound sources as in fig. 1, such as field level or stadium or commentary etc. based on isolated audio based on user input parameters for each audio source as in fig. 2a-2b and 0067 0072, in an environment e.g. stadium 0031 or commentators broadcasting 0069)

Re claim 8, while the combination teaches DNN and identical operations applied to multiple channels or audio sources, a neural network, masking, and audio source isolation, it fails to teach:
8. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: input an audio signal sample associated with the transduced audio stream to a time-frequency domain transformation pipeline of a digital signal processing process for a transformation period; (CUTLER performing frequency attenuation with masking for noise reduction 0057, and transforms per frame within a period of time 0062-0063 in an iterative embedding element e.g. 610 and fig. 6 in real time for audio e.g. 0092)
input the audio signal sample to a deep neural network (DNN) processing loop comprising the neural network model; and (CUTLER dual microphone system but with dedicated neural network per microphone or source audio, operating in parallel mixed together 0106 and 0111, also performing frequency attenuation with masking for noise reduction 0057, and transforms per frame within a period of time 0062-0063 in an iterative embedding element e.g. 610 and fig. 6 in real time for audio e.g. 0092)
based on the audio isolation mask being determined prior to expiration of the transformation period, apply the audio isolation mask to a frequency domain version of the audio signal sample associated with the time-frequency domain transformation pipeline to generate the isolated audio.  (CUTLER expiration such as start and end time in a frame or window, dual microphone system but with dedicated neural network per microphone or source audio, operating in parallel mixed together 0106 and 0111, also performing frequency attenuation with masking for noise reduction 0057, and transforms per frame within a period of time 0062-0063 in an iterative embedding element e.g. 610 and fig. 6 in real time for audio e.g. 0092)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal in view of Chhetri to incorporate the above claim limitations as taught by CUTLER to allow for combining prior art elements according to known methods to yield predictable results such as to create optimal source-specific modeling, leading to cleaner separation, better spatial handling, and reduced crosstalk, which thereby analogously aligns with Vestal in view of Chhetri and enhances flexibility by improving distinct temporal and spectral features of instruments or voices, while allowing for individualized processing of stereo/multi-channel gain and phase differences.

Claims 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20200081682 A1 Vestal; Christopher Daniel (hereinafter Vestal) in view of US 12272369 B1 Chhetri; Amit Singh et al. (hereinafter Chhetri) and further in view of US 12288566 B1 Ganguly; Anshuman et al. (hereinafter Ganguly).
Re claim 15, the combination teaches DNN and multiple channel sounds but fails to teach:
15. The apparatus of claim 1, wherein the output device is a haptic device, and wherein the output data comprises a control signal for the haptic device.  (Ganguly well known uses of haptic feedback col 84 lines 56--62)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal in view of Chhetri to incorporate the above claim limitations as taught by Ganguly to allow for combining prior art elements according to known methods to yield predictable results such as by using a well known form of user notification, such as when a user mutes or alters a volume level of an isolated audio source allowing for display and haptic well known feedback such as if a user is in a noisy dark environment to confirm a setting.






Claims 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20200081682 A1 Vestal; Christopher Daniel (hereinafter Vestal) in view of US 12272369 B1 Chhetri; Amit Singh et al. (hereinafter Chhetri) and further in view of US 20150341734 A1 Sherman; Vladimir (hereinafter Sherman).
Re claim 17, the combination teaches DNN and multiple channel sounds but fails to teach:
17. The apparatus of claim 1, wherein the instructions are further operable to cause the apparatus to: perform beam steering associated with the at least one capture device based at least in part on the audio isolation mask.  (Sherman masking signals outside the direction of interest converted with beam steering 0013, 0017, 0018)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Vestal in view of Chhetri to incorporate the above claim limitations as taught by Sherman to allow for combining prior art elements according to known methods to yield predictable results such as to produce precise, targeted sound delivery, significantly increasing speech intelligibility, reducing unwanted reverberations, and isolating audio to specific areas with minimal mechanical adjustment such as to auto-correct a user’s parameters, applicable to stadiums for instance which eliminates echoes in acoustically challenging spaces like high-ceilinged halls, leading to improved privacy and cleaner sound reproduction.


Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

US 20110252948 A1	HUMPHREY; SCOTT
Isolating vocals and instruments

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C COLUCCI whose telephone number is (571)270-1847.  The examiner can normally be reached on M-F 9 AM - 5 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Flanders can be reached at (571)272-7516.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/MICHAEL COLUCCI/Primary Examiner, Art Unit 2655                                                                                                                                                                                               (571)-270-1847
Examiner FAX:  (571)-270-2847
Michael.Colucci@uspto.gov
Read full office action
Prosecution Timeline

Jul 03, 2024
Application Filed
Apr 01, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/422,681
Patent 12640144
Generating Synthetic Conference Transcripts Using Natural Language Processing
2y 4m to grant Granted May 26, 2026
18/401,171
Patent 12633286
MACHINE LEARNING MODEL IMPROVEMENT
2y 4m to grant Granted May 19, 2026
18/352,601
Patent 12626697
SYSTEM AND METHOD FOR KEYWORD FALSE ALARM REDUCTION
2y 10m to grant Granted May 12, 2026
19/225,487
Patent 12620262
USING ARTIFICIAL ENTITIES FOR GENERATING PERSONALIZED RESPONSES
11m to grant Granted May 05, 2026
18/515,502
Patent 12592240
ENCODING AND DECODING OF ACOUSTIC ENVIRONMENT
2y 4m to grant Granted Mar 31, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
91%
With Interview (+15.2%)
3y 1m (~1y 3m remaining)
Median Time to Grant
Low
PTA Risk
Based on 999 resolved cases by this examiner. Grant probability derived from career allowance rate.